From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | amul sul <sulamul(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: ThisTimeLineID can be used uninitialized |
Date: | 2021-10-19 20:43:59 |
Message-ID: | 20211019204359.aakuvtk7tjari6to@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2021-10-19 15:13:04 -0400, Robert Haas wrote:
> This is a followup to
> http://postgr.es/m/CA+TgmoZ5A26C6OxKApafyuy_sx0VG6VXdD_Q6aSEzsvrPHDwzw@mail.gmail.com.
> I'm suspicious of the following code in CreateReplicationSlot:
>
> /* setup state for WalSndSegmentOpen */
> sendTimeLineIsHistoric = false;
> sendTimeLine = ThisTimeLineID;
>
> The first thing that's odd about this is that if this is physical
> replication, it's apparently dead code, because AFAICT sendTimeLine
> will not be used for anything in that case.
It's quite confusing. It's *really* not helped by physical replication using
but not really using an xlogreader to keep state. Which presumably isn't
actually used during a physical CreateReplicationSlot(), but is referenced by
a comment :/
> But I don't know if it matters. We call CreateInitDecodingContext()
> with sendTimeLine and ThisTimeLineID still zero; it doesn't call any
> callbacks. Then we call DecodingContextFindStartpoint() with
> sendTimeLine still 0 and the first callback that gets invoked is
> logical_read_xlog_page. At this point sendTimeLine = 0 and
> ThisTimeLineID = 0. That calls XLogReadDetermineTimeline() which
> resets ThisTimeLineID to the correct value of 2, but when we get back
> to logical_read_xlog_page, we still manage to call WALRead with a
> timeline of 0 because state->seg.ws_tli is still 0. And when WALRead
> eventually does call WalSndOpen, which unconditionally propagates
> sendTimeLine into the TLI pointer that is passed to it. So now
> state->seg_ws_tli also ends up being 2. So I guess maybe nothing bad
> happens? But it sure seems strange that the code would apparently work
> just as well as it does today with the following patch:
>
> diff --git a/src/backend/replication/walsender.c
> b/src/backend/replication/walsender.c
> index b811a5c0ef..44fd598519 100644
> --- a/src/backend/replication/walsender.c
> +++ b/src/backend/replication/walsender.c
> @@ -945,7 +945,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
>
> /* setup state for WalSndSegmentOpen */
> sendTimeLineIsHistoric = false;
> - sendTimeLine = ThisTimeLineID;
> + sendTimeLine = rand() % 10;
>
> if (cmd->kind == REPLICATION_KIND_PHYSICAL)
> {
Istm we should introduce an InvalidTimeLineID, and explicitly initialize
sendTimeLine to that, and assert that it's valid / invalid in a bunch of
places?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Bossart, Nathan | 2021-10-19 20:44:35 | Re: ALTER INDEX .. RENAME allows to rename tables/views as well |
Previous Message | Alvaro Herrera | 2021-10-19 20:36:04 | Re: ALTER INDEX .. RENAME allows to rename tables/views as well |