From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: persist logical slots to disk during shutdown checkpoint |
Date: | 2023-08-31 13:58:32 |
Message-ID: | CAA4eK1K3-FVxdTwzgwPy+wSUSVdcH-V3tSfZd5houJCjpon_-A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > I
> > > > > think we should shut down subscriber, restart publisher and then make this
> > > > > check based on the contents of the replication slot instead of server log.
> > > > > Shutting down subscriber will ensure that the subscriber won't send any new
> > > > > confirmed flush location to the publisher after restart.
> > > > >
> > > >
> > > > But if we shutdown the subscriber before the publisher there is no
> > > > guarantee that the publisher has sent all outstanding logs up to the
> > > > shutdown checkpoint record (i.e., the latest record). Such a guarantee
> > > > can only be there if we do a clean shutdown of the publisher before
> > > > the subscriber.
> > >
> > > So the sequence is shutdown publisher node, shutdown subscriber node,
> > > start publisher node and carry out the checks.
> > >
> >
> > This can probably work but I still prefer the current approach as that
> > will be closer to the ideal values on the disk instead of comparison
> > with a later in-memory value of confirmed_flush LSN. Ideally, if we
> > would have a tool like pg_replslotdata which can read the on-disk
> > state of slots that would be better but missing that, the current one
> > sounds like the next best possibility. Do you see any problem with the
> > current approach of test?
>
> > + qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+),
> > reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
>
> I don't think the LSN reported in this message is guaranteed to be the
> confirmed_flush LSN of the slot. It's usually confirmed_flush but not
> always. It's the LSN that snapshot builder computes based on factors
> including confirmed_flush. There's a chance that this test will fail
> sometimes because of this behaviour.
>
I think I am missing something here because as per my understanding,
the LOG referred by the test is generated in CreateDecodingContext()
before which we shouldn't be changing the slot's confirmed_flush LSN.
The LOG [1] refers to the slot's persistent value for confirmed_flush,
so how it could be different from what the test is expecting.
[1]
errdetail("Streaming transactions committing after %X/%X, reading WAL
from %X/%X.",
LSN_FORMAT_ARGS(slot->data.confirmed_flush),
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2023-08-31 14:09:48 | Re: Replace some cstring_to_text to cstring_to_text_with_len |
Previous Message | Erik Rijkers | 2023-08-31 13:51:52 | Re: remaining sql/json patches |