Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Date: 2024-11-13 10:59:19
Message-ID: CAA4eK1+69y-K57jWoEpDpgTOnDqF1M_a7XFc=RcxUWG8YmAMxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 12, 2024 at 12:43 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Tue, Nov 12, 2024 at 12:02 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Nov 11, 2024 at 2:08 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> > >
> > >
> > > But neither of those fixes prevents backwards move for confirmed_flush
> > > LSN, as enforced by asserts in the 0005 patch. I don't know if this
> > > assert is incorrect or now. It seems natural that once we get a
> > > confirmation for some LSN, we can't move before that position, but I'm
> > > not sure about that. Maybe it's too strict.
> >
> > Hmm, I'm concerned that it might be another problem. I think there are
> > some cases where a subscriber sends a flush position older than slot's
> > confirmed_flush as a feedback message.
> >

Right, it can happen for cases where subscribers doesn't have to do
anything (for example DDLs) like I have explained in one of my emails
[1]

> > But it seems to be dangerous if
> > we always accept it as a new confirmed_flush value. It could happen
> > that confirm_flush could be set to a LSN older than restart_lsn.
> >

Possible, though I haven't tried to reproduce such a case. But, will
it create any issues? I don't know if there is any benefit in allowing
to move confirmed_flush LSN backward. AFAIR, we don't allow such
backward values to persist. They will temporarily be in memory. I
think as a separate patch we should prevent it from moving backward.

>
> If confirmed_flush LSN moves backwards, it means the transactions
> which were thought to be replicated earlier are no longer considered
> to be replicated. This means that the restart_lsn of the slot needs to
> be at least far back as the oldest of starting points of those
> transactions. Thus restart_lsn of slot has to be pushed further back.
>

I don't see a reason to move restart_lsn backward. Why do you think so?

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BzWQwOe5G8zCYGvErnaXh5%2BDbyg_A1Z3uywSf_4%3DT0UA%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-11-13 11:03:05 Re: Enable data checksums by default
Previous Message Peter Eisentraut 2024-11-13 10:53:06 Re: SQL:2011 application time