Quick Links

Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc:	Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Date:	2024-11-12 19:50:11
Message-ID:	CAD21AoDycfi6TXF9jHZPYFmjgw_3MgTYp641uWFF+TgEog_mQw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Nov 12, 2024 at 4:08 AM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Tue, Nov 12, 2024 at 4:55 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
>
> > but I'm still wondering if the current coding was intentional and we're
> > just missing why it was written like this.
>
> Interestingly, the asymmetry between the functions is added in the
> same commit b89e151054a05f0f6d356ca52e3b725dd0505e53. I doubt it was
> intentional; the comments at both places say the same thing. This
> problem is probably as old as that commit.

FYI the asymmetry appears to have been present since the first version
of the patch that gave the LogicalIncreaseRestartDecodingForSlot()
this form[1].

>
> >
> > For the reconnect, I think it's a bit as if the primary restarted right
> > before the reconnect. That could happen anyway, and we need to handle
> > that correctly - if not, we have yet another issue, IMHO. And with the
> > restart it's the same as writing the slot to disk and reading it back,
> > which also doesn't retain most of the fields. So it seems cleaner to do
> > the same thing and just reset the various fields.
> >
>
> >
> > There's also the question of backpatching - the simpler the better, and
> > this I think just resetting the fields wins in this regard. The main
> > question is whether it's correct - I think it is. I'm not too worried
> > about efficiency very much, on the grounds that this should not matter
> > very often (only after unexpected restart). Correctness > efficiency.
>
> If the slot's restart_lsn is very old before disconnect we will lose
> an opportunity to update the restart_lsn and thus release some
> resources earlier. However, that opportunity is only for a short
> duration. On a fast enough machine the data.restart_lsn will be
> updated anyway after processing all running transactions wal records
> anyway. So I am also of the opinion that the argument of efficiency
> won't stand here. But I doubt if "not resetting" is wrong. It looks
> more intentional to me than the asymmetry between
> LogicalIncreaseRestartDecodingForSlot and LogicalIncreaseXminForSlot.

In addition to the asymmetry between two functions, the reason why I
think we should fix LogicalIncreaseRestartDecodingForSlot() is the
fact that it accepts any LSN as a candidate restart_lsn. Which is
dangerous and incorrect to me even if there wasn't the asymmetry
stuff.

Regards,

[1] https://www.postgresql.org/message-id/flat/20131212000839.GA22776%40awork2.anarazel.de#e21aed91a35b31c86d34026369501816

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 at 2024-11-12 12:08:42 from Ashutosh Bapat

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2024-11-12 20:00:01	Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
Previous Message	Alvaro Herrera	2024-11-12 19:18:30	Re: pg_rewind WAL segments deletion pitfall