From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | michael(at)paquier(dot)xyz |
Cc: | jgdr(at)dalibo(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: [BUG] non archived WAL removed during production crash recovery |
Date: | 2020-04-27 09:21:07 |
Message-ID: | 20200427.182107.1145997462405167356.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
At Mon, 27 Apr 2020 16:49:45 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Fri, Apr 24, 2020 at 03:03:00PM +0200, Jehan-Guillaume de Rorthais wrote:
> > I agree the three tests could be removed as they were not covering the bug we
> > were chasing. However, they might still be useful to detect futur non expected
> > behavior changes. If you agree with this, please, find in attachment a patch
> > proposal against HEAD that recreate these three tests **after** a waiting loop
> > on both standby1 and standby2. This waiting loop is inspired from the tests in
> > 9.5 -> 10.
>
> FWIW, I would prefer keeping all three tests as well.
>
> So.. I have spent more time on this problem and mereswin here is a
> very good sample because it failed all three tests:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mereswine&dt=2020-04-24%2006%3A03%3A53
>
> For standby2, we get this failure:
> ok 11 - .ready file for WAL segment 000000010000000000000001 existing
> in backup is kept with archive_mode=always on standby
> not ok 12 - .ready file for WAL segment 000000010000000000000002
> created with archive_mode=always on standby
>
> Then, looking at 020_archive_status_standby2.log, we have the
> following logs:
> 2020-04-24 02:08:32.032 PDT [9841:3] 020_archive_status.pl LOG:
> statement: CHECKPOINT
> [...]
> 2020-04-24 02:08:32.303 PDT [9821:7] LOG: restored log file
> "000000010000000000000002" from archive
>
> In this case, the test forced a checkpoint to test the segment
> recycling *before* the extra restored segment we'd like to work on was
> actually restored. So it looks like my initial feeling about the
> timing issue was right, and I am also able to reproduce the original
> set of failures by adding a manual sleep to delay restores of
> segments, like that for example:
> --- a/src/backend/access/transam/xlogarchive.c
> +++ b/src/backend/access/transam/xlogarchive.c
> @@ -74,6 +74,8 @@ RestoreArchivedFile(char *path, const char *xlogfname,
> if (recoveryRestoreCommand == NULL ||
> strcmp(recoveryRestoreCommand, "") == 0)
> goto not_available;
>
> + pg_usleep(10 * 1000000); /* 10s */
> +
> /*
>
> With your patch the problem does not show up anymore even with the
> delay added, so I would like to apply what you have sent and add back
> those tests. For now, I would just patch HEAD though as that's not
> worth the risk of destabilizing stable branches in the buildfarm.
Agreed to the diagnosis and the fix. The fix reliably cause a restart
point then the restart point manipulats the status files the right way
before the CHECKPOINT command resturns, in the both cases.
If I would add something to the fix, the following line may need a
comment.
+# Wait for the checkpoint record is replayed so that the following
+# CHECKPOINT causes a restart point reliably.
|+$standby1->poll_query_until('postgres',
|+ qq{ SELECT pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '$primary_lsn') >= 0 }
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2020-04-27 10:11:52 | BUG #16394: Conflicting package postgis versions 2.5 and 3.0 |
Previous Message | PG Bug reporting form | 2020-04-27 09:19:05 | BUG #16393: PANIC: cannot abort transaction, it was already committed |
From | Date | Subject | |
---|---|---|---|
Next Message | Prabhat Sahu | 2020-04-27 09:26:42 | Re: [Proposal] Global temporary tables |
Previous Message | Alexander Korotkov | 2020-04-27 08:51:51 | Re: Concurrency bug in amcheck |