Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Date: 2024-11-12 12:19:47
Message-ID: CAA4eK1+GYx_VnWZ5sOCqVN1Jk+edQcY9auf=BhixCYGLJvNPRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 12, 2024 at 4:55 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> There's also the question of backpatching - the simpler the better, and
> this I think just resetting the fields wins in this regard. The main
> question is whether it's correct - I think it is. I'm not too worried
> about efficiency very much, on the grounds that this should not matter
> very often (only after unexpected restart). Correctness > efficiency.
>

Sure, but what is wrong with changing
LogicalIncreaseRestartDecodingForSlot's "if
(slot->candidate_restart_valid == InvalidXLogRecPtr)" to "else if
(slot->candidate_restart_valid == InvalidXLogRecPtr)"? My previous
analysis [1][2] on similar issue also leads to that conclusion. Then
later Sawada-San's email [3] also leads to the same solution. I know
that the same has been discussed in this thread and we are primarily
worried about whether we are missing some case that needs the current
code in LogicalIncreaseRestartDecodingForSlot(). It is always possible
that all who have analyzed are missing some point but I feel the
chances are less. I vote to fix LogicalIncreaseRestartDecodingForSlot.

Now, we have at least some theory to not clear candidate_restart_*
values which is why to prevent advancing restart_lsn earlier if we get
confirmation from the subscriber. Now, your theory that walsender
exits should be rare so this doesn't impact much is also true but OTOH
why change something that can work more efficiently provided we fix
LogicalIncreaseRestartDecodingForSlot as per our analysis?

[1] - https://www.postgresql.org/message-id/CAA4eK1JvyWHzMwhO9jzPquctE_ha6bz3EkB3KE6qQJx63StErQ%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAA4eK1KcnTvwrVqmpRTEMpyarBeTxwr8KA%2BkaveQOiqJ0zYsXA%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/CAD21AoBVhYnGBuW_o%3DwEGgTp01qiHNAx1a14b1X9kFXmuBe%3Dsg%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-11-12 12:24:40 Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
Previous Message Ashutosh Bapat 2024-11-12 12:08:42 Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4