Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shvetamalik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-10-04 00:05:52
Message-ID: CAA4eK1LCH9m_MdAh6WUV4sEmRJFr94FdVy_0MG86ht7VnWh1+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 3, 2023 at 9:27 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Tue, Oct 3, 2023 at 7:56 PM Drouvot, Bertrand
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > On 10/3/23 12:54 PM, Amit Kapila wrote:
> > > On Mon, Oct 2, 2023 at 11:39 AM Drouvot, Bertrand
> > > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > >>
> > >> On 9/29/23 1:33 PM, Amit Kapila wrote:
> > >>> On Thu, Sep 28, 2023 at 6:31 PM Drouvot, Bertrand
> > >>> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > >>>>
> > >>>
> > >>>> - probably open corner cases like: what if a standby is down? would that mean
> > >>>> that synchronize_slot_names not being send to the primary would allow the decoding
> > >>>> on the primary to go ahead?
> > >>>>
> > >>>
> > >>> Good question. BTW, irrespective of whether we have
> > >>> 'standby_slot_names' parameters or not, how should we behave if
> > >>> standby is down? Say, if 'synchronize_slot_names' is only specified on
> > >>> standby then in such a situation primary won't be even aware that some
> > >>> of the logical walsenders need to wait.
> > >>
> > >> Exactly, that's why I was thinking keeping standby_slot_names to address
> > >> this scenario. In such a case one could simply decide to keep or remove
> > >> the associated physical replication slot from standby_slot_names. Keep would
> > >> mean "wait" and removing would mean allow to decode on the primary.
> > >>
> > >>> OTOH, one can say that users
> > >>> should configure 'synchronize_slot_names' on both primary and standby
> > >>> but note that this value could be different for different standby's,
> > >>> so we can't configure it on primary.
> > >>>
> > >>
> > >> Yeah, I think that's a good use case for standby_slot_names, what do you think?
> > >>
> > >
> > > But, even if we keep 'standby_slot_names' for this purpose, the
> > > primary doesn't know the value of 'synchronize_slot_names' once the
> > > standby is down and or the primary is restarted. So, how will we know
> > > which logical WAL senders needs to wait for 'standby_slot_names'?
> > >
> >
> > Yeah right, I also think we'd need:
> >
> > - synchronize_slot_names on both primary and standby
> >
> > But now we would need to take care of different standby having different values (
> > as you said up-thread)....
> >
> > Thinking out loud: What about a single GUC on the primary (not standby_slot_names nor
> > synchronize_slot_names) but say logical_slots_wait_for_standby that could be a list of say
> > "logical_slot_name:physical_slot".
> >
> > I think this GUC would help us define each walsender behavior (should the standby(s)
> > be up or down):
> >
>
> It may help in defining the walsender's behaviour better for sure. But
> the problem I see once we start defining sync-slot-names on primary
> (in any form whether as independent GUC or as above mapping GUC) is
> that it needs to be then in sync with standbys, as each standby for
> sure needs to maintain its own sync-slot-names GUC to make it aware of
> what all it needs to sync.

Yes, I also think so. Also, defining such a GUC where user wants to
sync all the slots which would normally be the case would be a night
mare for the users.

>
> This brings us to the original question of
> how do we actually keep these configurations in sync between primary
> and standby if we plan to maintain it on both?
>
>
> > - don't wait if its associated logical_slot is not listed in this GUC
> > - or wait based on its associated "list" of mapped physical slots (would probably
> > have to deal with the min restart_lsn for all the corresponding mapped ones).
> >
> > I don't think we can avoid having to define at least one GUC on the primary (at least to
> > handle the case of standby(s) being down).
> >

How about an alternate scheme where we define sync_slot_names on
standby but then store the physical_slot_name in the corresponding
logical slot (ReplicationSlotPersistentData) to be synced? So, the
standby will send the list of 'sync_slot_names' and the primary will
add the physical standby's slot_name in each of the corresponding
sync_slot. Now, if we do this then even after restart, we should be
able to know for which physical slot each logical slot needs to wait.
We can even provide an SQL API to reset the value of
standby_slot_names in logical slots as a way to unblock decoding in
case of emergency (for example, corresponding when physical standby
never comes up).

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-10-04 00:07:21 Re: False "pg_serial": apparent wraparound” in logs
Previous Message Michael Paquier 2023-10-04 00:00:23 Re: pgstatindex vs. !indisready