From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: removing global variable ThisTimeLineID |
Date: | 2021-11-23 13:30:44 |
Message-ID: | CA+TgmobkZviEWFFvMGO_n82cc7z0Rgxx=Fi3+A4Ox0qZaL2S2Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Nov 23, 2021 at 4:36 AM Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
> Indeed, I think the logical decoding on standby patch just needs to
> change the Assert in GetWALInsertionTimeLine() to check
> SharedRecoveryState is RECOVERY_STATE_DONE or RECOVERY_STATE_ARCHIVE.
>
> Would that make sense from your point of view to add the check on
> RECOVERY_STATE_ARCHIVE or do you think it would need to be more
> "sophisticated"?
Unless I'm confused, that would just be incorrect.
GetWALInsertionTimeLine() just returns InsertTimeLineID, which won't
be initialized during archive recovery. I discussed this point a bit
on some other thread with Andres. It's possible that all that you need
to do here is determine the replayTLI using e.g. GetXLogReplayRecPtr.
But I don't really see why that should be correct:
read_local_xlog_page() has some kind of wait-and-retry logic and I'm
not clear why we don't need the same thing here. After all, if we're
on the primary and we determine that sendTimeLineHistoric = false and
sendTimeLine = whatever, neither of those values can become incorrect
afterward. But on a standby that can certainly happen, if an upstream
server is promoted. Note that the comment in read_local_xlog_page()
says this:
* If that happens after our caller determined the TLI but before
* we actually read the xlog page, we might still try to read from the
* old (now renamed) segment and fail. There's not much we can do
* about this, but it can only happen when we're a leaf of a cascading
* standby whose primary gets promoted while we're decoding, so a
* one-off ERROR isn't too bad.
That seems to imply that the retry loop here is designed, at least in
part, to protect against exactly this kind of scenario.
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2021-11-23 13:36:17 | Re: logical decoding and replication of sequences |
Previous Message | Daniel Gustafsson | 2021-11-23 13:10:48 | Re: Windows build warnings |