Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Date: 2024-11-11 14:17:36
Message-ID: 47b86037-0e70-42a1-8656-9c0bd70ecd8c@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/11/24 14:51, Ashutosh Bapat wrote:
> ...
>
> I think the problem is about processing older running transactions
> record and setting data.restart_lsn based on the candidates those
> records produce. But what is not clear to me is how come a newer
> candidate_restart_lsn is available immediately upon WAL sender
> restart. I.e. in the sequence of events you mentioned in your first
> email
> 1. 344.139 LOG: starting logical decoding for slot "s1"
>
> 2. 344.139 DETAIL: Streaming transactions committing after 1/E97EAC30,
> reading WAL from 1/E96FB4D0.
>
> 3. 344.140 LOG: logical decoding found consistent point at 1/E96FB4D0
>
> 4. 344.140 DETAIL: Logical decoding will begin using saved snapshot.
>
> 5. 344.140 LOG: LogicalConfirmReceivedLocation 1/E9865398
>
> 6. 344.140 LOG: LogicalConfirmReceivedLocation updating
> data.restart_lsn to 1/E979D4C8 (from 1/E96FB4D0)
> candidate_restart_valid 0/0 (from 1/E9865398)
> candidate_restart_lsn 0/0 (from 1/E979D4C8)
>
> how did candidate_restart_lsn = 1/E979D4C8 and candidate_restart_valid
> = 1/E9865398 were set in ReplicationSlot after WAL sender? It means it
> must have read and processed running transaction record at 1/E9865398.
> If that's true, how come it went back to a running transactions WAL
> record at 1/E979D4C8? It should be reading WAL records sequentially,
> hence read 1/E979D4C8 first then 1/E9865398.
>
> 344.145 LOG: LogicalIncreaseRestartDecodingForSlot s1
> candidate_restart_valid_lsn 1/E979D4C8 (0/0)
> candidate_restart_lsn 1/E96FB4D0 (0/0)
>

Those are good questions, but IIUC that's explained by this comment from
Masahiko-san's analysis [1]:

Thinking about the root cause more, it seems to me that the root cause
is not the fact that candidate_xxx values are not cleared when being
released.

In the scenario I reproduced, after restarting the logical decoding,
the walsender sets the restart_lsn to a candidate_restart_lsn left in
the slot upon receiving the ack from the subscriber. ...

If this is correct, then what happens is:

1) replication is running, at some point we set candidate LSN to B

2) something breaks, causing reconnect with restart LSN A (< B)

3) we still have the candidate LSN B in memory, and after receiving
some confirmation we set it as restart_lsn

4) we get to decode the RUNNING_XACTS, which moves restart_lsn back

If this analysis is correct, I think it's rather suspicious we don't
reset the candidate fields on restart. Can those "old" values ever be
valid? But I haven't tried resetting them.

Also, this is why I'm not entirely sure just tweaking the conditions in
LogicalIncreaseRestartDecodingForSlot is quite correct. Maybe it fixes
this particular issue, but maybe the right fix would be to reset the
candidate fields on reconnect? And this change would be just hiding the
actual problem. I haven't tried this.

[1]
https://www.postgresql.org/message-id/CAD21AoBVhYnGBuW_o%3DwEGgTp01qiHNAx1a14b1X9kFXmuBe%3Dsg%40mail.gmail.com

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2024-11-11 15:29:51 PG17 failing tests (DST related?)
Previous Message Andrei Lepikhov 2024-11-11 14:10:47 Re: Incremental Sort Cost Estimation Instability