From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman |
Date: | 2022-04-07 17:52:10 |
Message-ID: | 20220407175210.q44nnrvkovprxo2a@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
Hi,
On 2022-04-07 13:40:30 -0400, Tom Lane wrote:
> Michael Paquier <michael(at)paquier(dot)xyz> writes:
> > Add TAP test for archive_cleanup_command and recovery_end_command
>
> grassquit just showed a non-reproducible failure in this test [1]:
I was just staring at that as well.
> # Postmaster PID for node "standby" is 291160
> ok 1 - check content from archives
> not ok 2 - archive_cleanup_command executed on checkpoint
>
> # Failed test 'archive_cleanup_command executed on checkpoint'
> # at t/002_archiving.pl line 74.
>
> This test is sending a CHECKPOINT command to the standby and
> expecting it to run the archive_cleanup_command, but it looks
> like the standby did not actually run any checkpoint:
>
> 2022-04-07 16:11:33.060 UTC [291806][not initialized][:0] LOG: connection received: host=[local]
> 2022-04-07 16:11:33.078 UTC [291806][client backend][2/15:0] LOG: connection authorized: user=bf database=postgres application_name=002_archiving.pl
> 2022-04-07 16:11:33.084 UTC [291806][client backend][2/16:0] LOG: statement: CHECKPOINT
> 2022-04-07 16:11:33.092 UTC [291806][client backend][:0] LOG: disconnection: session time: 0:00:00.032 user=bf database=postgres host=[local]
>
> I am suspicious that the reason is that ProcessUtility does not
> ask for a forced checkpoint when in recovery:
>
> RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT |
> (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE));
>
> The trouble with this theory is that this test has been there for
> nearly six months and this is the first such failure (I scraped the
> buildfarm logs to be sure). Seems like failures should be a lot
> more common than that.
> I wondered if the recent pg_stats changes could have affected this, but I
> don't really see how.
I don't really see either. It's a bit more conceivable that the recovery
prefetching changes could affect the timing sufficiently?
It's also possible that it requires an animal of a certain speed to happen -
we didn't have an -fsanitize=address animal until recently.
I guess we'll have to wait and see what the frequency of the problem is?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-04-07 17:57:45 | Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman |
Previous Message | Tom Lane | 2022-04-07 17:40:30 | Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-04-07 17:53:59 | Re: [PATCH] Add native windows on arm64 support |
Previous Message | Tom Lane | 2022-04-07 17:42:49 | Re: [PATCH] Add native windows on arm64 support |