Re: Failing streaming replication on PostgreSQL 14

From: Nicolas Seinlet <nicolas(at)seinlet(dot)com>
To: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Failing streaming replication on PostgreSQL 14
Date: 2024-04-22 11:50:27
Message-ID: 5DyhufZPA9iljX8zsyrAw8zCw3wg4GsrKTodhOtvS-tJOmYHKoIiHEdKH4DCbkTA14fDzriyIJa0sFJgVN8W_HFDEIQY6nNwujdMRweJmzI=@seinlet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

facing the same situation again, but this time, analyzing the wal with xxd shows a different pattern. I had no blocks of 0000.

The output of pg_waldump is:
pg_waldump: fatal: error in WAL record at 11C/93F9FF70: invalid magic number 0000 in log segment 000000010000011C00000093, offset 16384000

The output of xxd -C16 is

00f9ff60: b364 0079 6e61 6d69 6320 6c80 0300 0000 .d.ynamic l.....
00f9ff70: 4000 0000 6659 a406 60f7 f993 1c01 0000 @...fY..`.......
00f9ff80: 000b 0000 82b3 8d9b 0020 1000 7f06 0000 ......... ......

I'm still unable to determine the cause of the issue, nor if the issue is on the primary server sending a corrupted wal segment, or on the secondary receiving a corrupted wal segment, or the openzfs filesystem on the primary allowing wal_sender to read still-not-written wal segment, or ...

Is there any log option I can add on the two clusters to help me locate the issue's origin?

thanks,

Nicolas.

On Tuesday, April 16th, 2024 at 09:56, Nicolas Seinlet <nicolas(at)seinlet(dot)com> wrote:

>

>

> Hello,
>

> > What exactly is "cyphered ZFS"? Can you reproduce the problem with some
> > other filesystem? If it's something very unusual, it might well be a
> > bug in the filesystem.
>

>

> The filesystem is openzfs with native aes-256-gcm encryption:
> https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops.7.html#encryption
>

> I've not tested if we get the same issue on another filesystem.
>

> I don't face the issue on Ubuntu 20.04/openzfs 0.8/PostgreSQL 12, but I have fewer systems with this deployment.
> On Ubuntu 22.04/openzfs 2.1.5/PostgreSQL 14, I face the issue from time to time, without knowing what triggers the error.
>

> thanks for helping,
>

> Nicolas.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2024-04-22 14:25:53 Re: CLUSTER vs. VACUUM FULL
Previous Message Marcos Pegoraro 2024-04-22 11:42:36 Re: CLUSTER vs. VACUUM FULL