Quick Links

Re: Race condition in recovery?

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	dilipbalaut(at)gmail(dot)com
Cc:	robertmhaas(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Race condition in recovery?
Date:	2021-05-18 06:52:07
Message-ID:	20210518.155207.2282568443886608104.horikyota.ntt@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

At Mon, 17 May 2021 10:46:24 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
> On Mon, May 17, 2021 at 10:09 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Mon, May 17, 2021 at 8:50 AM Kyotaro Horiguchi
> > <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > >
> > > Before the commit expectedTLEs is always initialized with just one
> > > entry for the TLI of the last checkpoint record.
> >
> > Right
> >
> > > (1) If XLogFileReadAnyTLI() found the segment but no history file
> > > found, that is, using the dummy TLE-list, expectedTLEs is initialized
> > > with the dummy one-entry list. So there's no behavioral change in this
> > > aspect.
> >
> > Yeah, you are right.
>
> But do you agree that one line entry will always be a checkpoint
> timeline entry? Because if you notice below code[1] in function
> "readRecoveryCommandFile();", then you will realize that once we come
> out of this function either the "recoveryTargetTLI" is checkpoint TL
> wherever it was before calling this function or we must have the
> history file. That means after exiting this function if we execute
> this line (expectedTLEs = readTimeLineHistory(recoveryTargetTLI);)
> that means either "expectedTLEs" could point to one dummy entry which
> will be nothing but the checkpoint TL entry or it will be holding
> complete history.

Right.

> The patch is trying to say that without the history file the
> checkpoint TL will not be found in "expectedTLEs" because the older TL
> (checkpoint TL) is not the ancestor of the target
> timeline(recoveryTargetTLI). But ideally, either the target timeline
> should be the same as the checkpoint timeline or we must have the
> history file as I stated in the above paragraph. Am I missing
> something?

Yeah, that has been the most mysterious point here. So I searched for
a situation the one-entry expectedTLEs does not work.

I vaguely believed that there's a case where REDO point of a
checkpoint is in the timeline previous to the record of the
checkpoint. The previous discussion is based on this case, but that
doesn't seem to happen. The last replayed checkpoint (that causes a
restartpoint) record is found before protmotion and the first
checkpoint starts after promotion.

A little while ago I tried to make a situation where a checkpoint
record is placed in the previous timeline of the tli written in the
control file. But control file is always written after checkpoint
record is flushed.

I rebooted myself from this:

ee994272ca:
> There is at least one scenario where this makes a difference: if you take
> a base backup from a standby server right after a timeline switch, the
> WAL segment containing the initial checkpoint record will begin with an
> older timeline ID. Without the timeline history file, recovering that file

And finally I think I could reach the situation the commit wanted to fix.

I took a basebackup from a standby just before replaying the first
checkpoint of the new timeline (by using debugger), without copying
pg_wal. In this backup, the control file contains checkPointCopy of
the previous timeline.

I modified StartXLOG so that expectedTLEs is set just after first
determining recoveryTargetTLI, then started the grandchild node. I
have the following error and the server fails to continue replication.

[postmaster] LOG: starting PostgreSQL 14beta1 on x86_64-pc-linux-gnu...
[startup] LOG: database system was interrupted while in recovery at log...
[startup] LOG: set expectedtles tli=6, length=1
[startup] LOG: Probing history file for TLI=7
[startup] LOG: entering standby mode
[startup] LOG: scanning segment 3 TLI 6, source 0
[startup] LOG: Trying fetching history file for TLI=6
[walreceiver] LOG: fetching timeline history file for timeline 5 from pri...
[walreceiver] LOG: fetching timeline history file for timeline 6 from pri...
[walreceiver] LOG: started streaming ... primary at 0/3000000 on timeline 5
[walreceiver] DETAIL: End of WAL reached on timeline 5 at 0/30006E0.
[startup] LOG: unexpected timeline ID 1 in log segment 000000050000000000000003, offset 0
[startup] LOG: Probing history file for TLI=7
[startup] LOG: scanning segment 3 TLI 6, source 0
(repeats forever)

This seems like the behavior the patch wanted to fix. (I'm not sure
precisely what happened at the time of the "unexpected timeline ID
1..", though. The line is seen only just after the first conection.)

> will fail as the older timeline ID is not recognized to be an ancestor of
> the target timeline. If you try to recover from such a backup, using only
> streaming replication to fetch the WAL, this patch is required for that to
> work.

After I reverted the modification, I got the following behavior
instead from the same backup.

[postmaster] LOG: starting PostgreSQL 14beta1 on x86_64-...
[startup] JST LOG: database system was interrupted while in recovery at log time 2021-05-18 13:45:59 JST
[startup] JST LOG: Probing history file for TLI=7
[startup] JST LOG: entering standby mode
[startup] JST LOG: Loading history file for TLI=6 (2)
[startup] JST LOG: Trying reading history file for TLI=6
[startup] JST LOG: scanning segment 3 TLI 6, source 0
[startup] JST LOG: Trying fetching history file for TLI=6
[walreceiver] JST LOG: fetching timeline history file for timeline 5 fro...
[walreceiver] JST LOG: fetching timeline history file for timeline 6 fro...
[walreceiver] JST LOG: started streaming ... primary at 0/3000000 on timeline 5
[walreceiver] JST LOG: replication terminated by primary server
[walreceiver] JST DETAIL: End of WAL reached on timeline 5 at 0/30006E0.
[startup] LOG: Loading expectedTLEs for 5
[startup] LOG: Trying reading history file for TLI=5
[startup] LOG: Checkpoint record: TLI=5, 0/3000668, rectargetTLI=6, exptles=0x3322a60
[startup] FATAL: requested timeline 6 does not contain minimum recovery point 0/30007C0 on timeline 6
[postmaster] LOG: startup process (PID 76526) exited with exit code 1
[postmaster] LOG: aborting startup due to startup process failure
[postmaster] LOG: database system is shut down

Aborts.. Yeah, this is the same issue with what is railed here. So,
still I'm not sure I confirmed the case exactly (since the problem is
still seen.. but I don't want to bother building the version.)...
Anyway reading history file for recoveryTargetTLI instead of
receiveTLI fixes that.

FWIW, you could be get a problematic base backup by the following steps.

0. (make sure /tmp/hoge is removed)
1. apply the attached patch
2. create a primary then start
3. create a standby then start
4. place standby.signal to the primary, then restart it.
5. place the file /tmp/hoge.
6. promote the "primary".
7. You will see a log line like this
LOG: WAIT START: CHECKPOINT_ONLINE: TLI=2
8. Take a base backup (without copying WAL files)

> [1]
> if (rtli)
> {
> /* Timeline 1 does not have a history file, all else should */
> if (rtli != 1 && !existsTimeLineHistory(rtli))
> ereport(FATAL,
> (errmsg("recovery target timeline %u does not exist",
> rtli)));
> recoveryTargetTLI = rtli;
> recoveryTargetIsLatest = false;
> }
> else
> {
> /* We start the "latest" search from pg_control's timeline */
> recoveryTargetTLI = findNewestTimeLine(recoveryTargetTLI);
> recoveryTargetIsLatest = true;
> }

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Re: Race condition in recovery? at 2021-05-17 05:16:24 from Dilip Kumar

Responses

Re: Race condition in recovery? at 2021-05-18 06:58:08 from Kyotaro Horiguchi
Re: Race condition in recovery? at 2021-05-19 12:16:05 from Dilip Kumar

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyotaro Horiguchi	2021-05-18 06:58:08	Re: Race condition in recovery?
Previous Message	Michael Paquier	2021-05-18 06:46:07	Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.