From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman |
Date: | 2022-04-08 21:55:51 |
Message-ID: | 4015413.1649454951@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2022-04-07 13:57:45 -0400, Tom Lane wrote:
>> Yeah, with only one instance it could just be cosmic rays or something.
>> However, assuming it is real, I guess I wonder why we don't say
>> CHECKPOINT_FORCE in standby mode too.
> I guess it might partially be that restartpoints require a checkpoint to have
> happened on the primary. If we used FORCE, we'd have to wait till the next
> checkpoint on the primary, which'd be a problem if it's e.g. a manually issued
> CHECKPOINT; before shutting the standby down.
After seeing skink's results, I tried running that test under valgrind
here, and it fails just like that every time. skink's history allows
us to bound the failure introduction between 79b716cfb7 and
d7ab2a9a3c, which I think makes it just about certain that it was
5dc0418fab (Prefetch data referenced by the WAL, take II), though I've
not bisected to be 100% sure.
Adding some debug printouts to ExecuteRecoveryCommand convinces me
that indeed the archive_cleanup_command is NOT getting called by the
problematic CHECKPOINT command. I surmise based on Andres' comment
above that the standby isn't making a restartpoint for lack of
an available primary checkpoint, which looks to me like it could be
a pre-existing bug in the test case: it's sure not doing anything to
guarantee that the primary's checkpoint record has reached the standby.
I tried adjusting the patch so it does guarantee that (as attached),
and in two out of two tries it got past the archive_cleanup_command
failure but then hung up waiting for standby2 to promote.
On the whole, I'm not sure that the WAL prefetch logic is noticeably
more stable than when we booted it out last year :-(. However, I also
wonder why it is that this test case wasn't occasionally failing already.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
002_archiving-hack.patch | text/x-diff | 890 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-04-08 22:14:29 | pgsql: Silence compiler warnings for unsupported compression methods. |
Previous Message | Tom Lane | 2022-04-08 18:55:27 | pgsql: Improve frontend error logging style. |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2022-04-08 21:56:37 | Re: Lowering the ever-growing heap->pd_lower |
Previous Message | Peter Geoghegan | 2022-04-08 21:43:31 | Re: Lowering the ever-growing heap->pd_lower |