Re: logical decoding and replication of sequences

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: logical decoding and replication of sequences
Date: 2022-08-07 19:12:05
Message-ID: 20220807191205.GA16699@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 07, 2022 at 03:18:52PM +0200, Tomas Vondra wrote:
> On 8/7/22 02:36, Noah Misch wrote:
> > On Thu, Apr 07, 2022 at 08:34:50PM +0200, Tomas Vondra wrote:
> >> I've pushed a revert af all the commits related to this - decoding of
> >> sequences and test_decoding / built-in replication changes.
> >
> > Two July buildfarm runs failed with PANIC during standby promotion:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2022-07-19%2004%3A13%3A18
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2022-07-31%2011%3A33%3A13
> >
> > The attached patch hacks things so an ordinary x86_64 GNU/Linux machine
> > reproduces this consistently. "git bisect" then traced the regression to the
> > above revert commit (2c7ea57e56ca5f668c32d4266e0a3e45b455bef5). The pg_ctl
> > test suite passes under this hack in all supported branches, and it passed on
> > v15 until that revert. Would you investigate?
> >
> > The buildfarm animal uses keep_error_builds. From kept data directories, I
> > deduced these events:
> >
> > - After the base backup, auto-analyze ran on the primary and wrote WAL.
> > - Standby streamed and wrote up to 0/301FFF.
> > - Standby received the promote signal. Terminated streaming. WAL page at 0/302000 remained all-zeros.
> > - Somehow, end-of-recovery became a PANIC.
>
> I think it'd be really bizarre if this was due to the revert, as that
> simply undoes minor WAL changes (and none of this should affect what
> happens at WAL page boundary etc.). It just restores WAL as it was
> before 0da92dc, nothing particularly complicated. I did go through all
> of the changes again and I haven't spotted anything particularly
> suspicious, but I'll give it another try tomorrow.
>
> However, I did try bisecting this using the attached patch, and that
> does not suggest the issue is in the revert commit. It actually fails
> all the way back to 5dc0418fab2, and it starts working on 9553b4115f1.
>
> ...
> 6392f2a0968 Try to silence "-Wmissing-braces" complaints in ...
> => 5dc0418fab2 Prefetch data referenced by the WAL, take II.
> 9553b4115f1 Fix warning introduced in 5c279a6d350.
> ...
>
> This is merely 10 commits before the revert, and it seems way more
> related to WAL. Also, adding this to the two nodes in 003_standby.pl
> makes the issue go away, it seems:
>
> $node_standby->append_conf('postgresql.conf',
> qq(recovery_prefetch = off));
>
> I'd bet it's about WAL prefetching, not the revert, and the bisect was a
> bit incorrect, because the commits are close and the failures happen to
> be rare. (Presumably you first did the bisect and then wrote the patch
> that reproduces this, right?)

No. I wrote the patch, then used the patch to drive the bisect. With ten
iterations, commit 2c7ea57 passes 0/10, while 2c7ea57^ passes 10/10. I've now
tried recovery_prefetch=off. With that, the test passes 10/10 at 2c7ea57.
Given your observation of a failure at 5dc0418fab2, I agree with your
conclusion. Whatever the role of 2c7ea57 in exposing the failure on my
machine, a root cause in WAL prefetching looks more likely.

> Adding Thomas Munro to the thread, he's the WAL prefetching expert ;-)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2022-08-07 19:28:36 Re: Slow standby snapshot
Previous Message Andres Freund 2022-08-07 18:27:07 Re: Cleaning up historical portability baggage