From: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: confirmed flush lsn seems to be move backward in certain error cases |
Date: | 2024-06-10 12:29:12 |
Message-ID: | CANhcyEWkdMjuDymaX2qR=b2eyNhO3ZMQr5weq6xKXmnDyyEq5Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 10 Jun 2024 at 16:39, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Feb 20, 2024 at 12:35 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Sat, 17 Feb 2024 at 12:03, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > >
> > > @@ -1839,7 +1839,8 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
> > >
> > > SpinLockAcquire(&MyReplicationSlot->mutex);
> > >
> > > - MyReplicationSlot->data.confirmed_flush = lsn;
> > > + if (lsn > MyReplicationSlot->data.confirmed_flush)
> > > + MyReplicationSlot->data.confirmed_flush = lsn;
> > >
> > > /* if we're past the location required for bumping xmin, do so */
> > > if (MyReplicationSlot->candidate_xmin_lsn != InvalidXLogRecPtr &&
> > > @@ -1904,7 +1905,8 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
> > > else
> > > {
> > > SpinLockAcquire(&MyReplicationSlot->mutex);
> > > - MyReplicationSlot->data.confirmed_flush = lsn;
> > > + if (lsn > MyReplicationSlot->data.confirmed_flush)
> > > + MyReplicationSlot->data.confirmed_flush = lsn;
> > >
> > > BTW, from which code path does it update the prior value of
> > > confirmed_flush?
> >
> > The confirmed_flush is getting set in the else condition for this scenario.
> >
> > If it is through the else check, then can we see if
> > > it may change the confirm_flush to the prior position via the first
> > > code path? I am asking because in the first code path, we can even
> > > flush the re-treated value of confirm_flush LSN.
> >
> > I was not able to find any scenario to set a prior position with the
> > first code path. I tried various scenarios like adding delay in
> > walsender, add delay in apply worker, restart the instances and with
> > various DML operations. It was always setting it to either to the same
> > value as previous or greater value.
> >
>
> Fair enough. This means that in the prior versions, it was never
> possible to move confirmed_flush LSN in the slot to a backward
> position on the disk. So, moving it backward temporarily (in the
> memory) shouldn't create any problem. I would prefer your
> Assert_confirmed_flush_will_always_not_be_less_than_last_saved_confirmed_flush.patch
> to fix this issue.
>
> Thoughts?
I was able to reproduce the issue with the test script provided in
[1]. I ran the script 10 times and I was able to reproduce the issue
4 times. I also tested the patch
Assert_confirmed_flush_will_always_not_be_less_than_last_saved_confirmed_flush.patch.
and it resolves the issue. I ran the test script 20 times and I was
not able to reproduce the issue.
Thanks and Regards,
Shlok Kyal
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2024-06-10 12:58:49 | Non-text mode for pg_dumpall |
Previous Message | James Coleman | 2024-06-10 12:28:28 | Re: Fix grammar oddities in comments |