From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: persist logical slots to disk during shutdown checkpoint |
Date: | 2023-08-31 09:22:47 |
Message-ID: | CAA4eK1Lsw1mLCNDcYtJxo0+X4QfrEexGgpAQWWtOqPxaXtRsZA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 31, 2023 at 12:25 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Thu, Aug 31, 2023 at 12:10 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > +
> > > + /*
> > > + * We won't ensure that the slot is persisted after the confirmed_flush
> > > + * LSN is updated as that could lead to frequent writes. However, we need
> > > + * to ensure that we do persist the slots at the time of shutdown whose
> > > + * confirmed_flush LSN is changed since we last saved the slot to disk.
> > > + * This will help in avoiding retreat of the confirmed_flush LSN after
> > > + * restart. This variable is used to track the last saved confirmed_flush
> > > + * LSN value.
> > > + */
> > >
> > > This comment makes more sense in SaveSlotToPath() than here. We may decide to
> > > use last_saved_confirmed_flush for something else in future.
> > >
> >
> > I have kept it here because it contains some information that is not
> > specific to SaveSlotToPath. So, it seems easier to follow the whole
> > theory if we keep it at the central place in the structure and then
> > add the reference wherever required but I am fine if you and others
> > feel strongly about moving this to SaveSlotToPath().
>
> Saving slot to disk happens only in SaveSlotToPath, so except the last
> sentence rest of the comment makes sense in SaveSlotToPath().
>
> > >
> > > This function assumes that the subscriber will receive and confirm WAL upto
> > > checkpoint location and publisher's WAL sender will update it in the slot.
> > > Where is the code to ensure that? Does the WAL sender process wait for
> > > checkpoint
> > > LSN to be confirmed when shutting down?
> > >
> >
> > Note, that we need to compare if all the WAL before the
> > shutdown_checkpoint WAL record is sent. Before the clean shutdown, we
> > do ensure that all the pending WAL is confirmed back. See the use of
> > WalSndDone() in WalSndLoop().
>
> Ok. Thanks for pointing that out to me.
>
> >
> > > I
> > > think we should shut down subscriber, restart publisher and then make this
> > > check based on the contents of the replication slot instead of server log.
> > > Shutting down subscriber will ensure that the subscriber won't send any new
> > > confirmed flush location to the publisher after restart.
> > >
> >
> > But if we shutdown the subscriber before the publisher there is no
> > guarantee that the publisher has sent all outstanding logs up to the
> > shutdown checkpoint record (i.e., the latest record). Such a guarantee
> > can only be there if we do a clean shutdown of the publisher before
> > the subscriber.
>
> So the sequence is shutdown publisher node, shutdown subscriber node,
> start publisher node and carry out the checks.
>
This can probably work but I still prefer the current approach as that
will be closer to the ideal values on the disk instead of comparison
with a later in-memory value of confirmed_flush LSN. Ideally, if we
would have a tool like pg_replslotdata which can read the on-disk
state of slots that would be better but missing that, the current one
sounds like the next best possibility. Do you see any problem with the
current approach of test?
BTW, I think we can keep autovacuum = off for this test just to avoid
any extra record generation even though that doesn't matter for the
purpose of test.
> >
> > > All the places which call ReplicationSlotSave() mark the slot as dirty. All
> > > the places where SaveSlotToPath() is called, the slot is marked dirty except
> > > when calling from CheckPointReplicationSlots(). So I am wondering whether we
> > > should be marking the slot dirty in CheckPointReplicationSlots() and avoid
> > > passing down is_shutdown flag to SaveSlotToPath().
> > >
> >
> > I feel that will add another spinlock acquire/release pair without
> > much benefit. Sure, it may not be performance-sensitive but still
> > adding another pair of lock/release doesn't seem like a better idea.
>
> We call ReplicatioinSlotMarkDirty() followed by ReplicationSlotSave()
> at all the places, even those which are more frequent than this.
>
All but one. Normally, the idea of marking dirty is to indicate that
we will actually write/flush the contents at a later point (except
when required for correctness) as even indicated in the comments atop
ReplicatioinSlotMarkDirty(). However, I see your point that we use
that protocol at all the current places including CreateSlotOnDisk().
So, we can probably do it here as well.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | David Geier | 2023-08-31 09:39:49 | Re: Eliminate redundant tuple visibility check in vacuum |
Previous Message | Jelte Fennema | 2023-08-31 09:13:00 | pg_basebackup: Always return valid temporary slot names |