Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-21 05:16:06
Message-ID: CAA4eK1L8s0fOhjV42UZkraoz4p4WL5=t8o+A0AZvTjdKomiG3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 20, 2023 at 6:51 PM Drouvot, Bertrand
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> On 11/20/23 11:59 AM, Amit Kapila wrote:
> > On Mon, Nov 20, 2023 at 3:17 PM Drouvot, Bertrand
> > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >>
> >> On 11/18/23 11:45 AM, Amit Kapila wrote:
> >>> On Fri, Nov 17, 2023 at 5:18 PM Drouvot, Bertrand
> >>> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >>>>
> >>>> On 11/17/23 2:46 AM, Zhijie Hou (Fujitsu) wrote:
> >>>>> On Tuesday, November 14, 2023 10:27 PM Drouvot, Bertrand <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >>>>>
> >>>>> I feel the WaitForWALToBecomeAvailable may not be the best place to shutdown
> >>>>> slotsync worker and drop slots. There could be other reasons(other than
> >>>>> promotion) as mentioned in comments in case XLOG_FROM_STREAM to reach the code
> >>>>> there. I thought if the intention is to stop slotsync workers on promotion,
> >>>>> maybe FinishWalRecovery() is a better place to do it as it's indicating the end
> >>>>> of recovery and XLogShutdownWalRcv is also called in it.
> >>>>
> >>>> I can see that slotsync_drop_initiated_slots() has been moved in FinishWalRecovery()
> >>>> in v35. That looks ok.
> >>>>>
> >>>
> >>> I was thinking what if we just ignore creating such slots (which
> >>> require init state) in the first place? I think that can be
> >>> time-consuming in some cases but it will reduce the complexity and we
> >>> can always improve such cases later if we really encounter them in the
> >>> real world. I am not very sure that added complexity is worth
> >>> addressing this particular case, so I would like to know your and
> >>> others' opinions.
> >>>
> >>
> >> I'm not sure I understand your point. Are you saying that we should not create
> >> slots on the standby that are "currently" reported in a 'i' state? (so just keep
> >> the 'r' and 'n' states?)
> >>
> >
> > Yes.
> >
>
> As far the 'i' state here, from what I see, it is currently useful for:
>
> 1. Cascading standby to not sync slots with state = 'i' from
> the first standby.
> 2. Easily report Slots that did not catch up on the primary yet.
> 3. Avoid inactive slots to block "active" ones creation.
>
> So not creating those slots should not be an issue for 1. (sync are
> not needed on cascading standby as not created on the first standby yet)
> but is an issue for 2. (unless we provide another way to keep track and report
> such slots) and 3. (as I think we should still need to reserve WAL).
>
> I've a question: we'd still need to reserve WAL for those slots, no?
>
> If that's the case and if we don't call ReplicationSlotCreate() then ReplicationSlotReserveWal()
> would not work as MyReplicationSlot would be NULL.
>

Yes, we need to reserve WAL to see if we can sync the slot. We are
currently creating an RS_EPHEMERAL slot and if we don't explicitly
persist it when we can't sync, then it will be dropped when we do
ReplicationSlotRelease() at the end of synchronize_one_slot(). So, the
loss is probably, the next time we again try to sync the slot, we need
to again create it and may need to wait for newer restart_lsn on
standby which could be avoided if we have the slot in 'i' state from
the previous run. I don't deny the importance of having 'i'
(initialized) state but was just trying to say that it has additional
code complexity. OTOH, having it may give better visibility to even
users about slots that are not active (say manually created slots on
the primary).

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-11-21 05:39:21 Re: Synchronizing slots from primary to standby
Previous Message Tom Lane 2023-11-21 05:05:36 Re: Do away with a few backwards compatibility macros