Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-12-06 05:26:47
Message-ID: CAA4eK1+EeUUT+gnzoHKWm7GqosA2ehT8QyoKtu1jiyGE=wUErw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 5, 2023 at 7:38 PM Drouvot, Bertrand
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> On 12/5/23 12:32 PM, Amit Kapila wrote:
> > On Tue, Dec 5, 2023 at 10:38 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >>
> >> On Mon, Dec 4, 2023 at 10:07 PM Drouvot, Bertrand
> >> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >>>>
> >>>
> >>> Maybe another option could be to have the walreceiver a way to let the slot sync
> >>> worker knows that it (the walreceiver) was not able to start due to non existing
> >>> replication slot on the primary? (that way we'd avoid the slot sync worker having
> >>> to talk to the primary).
> >>
> >> Few points:
> >> 1) I think if we do it, we should do it in generic way i.e. slotsync
> >> worker should go to no-op if walreceiver is not able to start due to
> >> any reason and not only due to invalid primary_slot_name.
> >> 2) Secondly, slotsync worker needs to make sure it has synced the
> >> slots so far i.e. worker should not go to no-op immediately on seeing
> >> missing WalRcv process if there are pending slots to be synced.
> >>
> >
> > Won't it be better to just ping and check the validity of
> > 'primary_slot_name' at the start of slot-sync and if it is changed
> > anytime? I think it would be better to avoid adding dependency on
> > walreciever state as that sounds like needless complexity.
>
> I think the overall extra complexity is linked to the fact that we first
> want to ensure that the slots are in sync before shutting down the
> sync slot worker.
>
> I think than talking to the primary or relying on the walreceiver state
> is "just" what would trigger the decision to shutdown the sync slot worker.
>
> Relying on the walreceiver state looks better to me (as it avoids possibly
> useless round trips with the primary).
>

But the round trip will only be once in the beginning and if the user
changes the GUC primary-slot_name which shouldn't be that often.

> Also the walreceiver could be down for multiple reasons, and I think there
> is no point of having a sync slot worker running if the slots are in sync and
> there is no walreceiver running (even if primary_slot_name is a valid one).
>

I feel that is indirectly relying on the fact that the primary won't
advance logical slots unless physical standby has consumed data. Now,
it is possible that slot-sync worker lags behind and still needs to
sync more data for slots in which it makes sense for slot-sync worker
to be alive. I think we can try to avoid checking walreceiver status
till we can get more data to avoid the problem I mentioned but it
doesn't sound like a clean way to achieve our purpose.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-12-06 05:38:45 Re: RFI: Extending the TOAST Pointer
Previous Message Shlok Kyal 2023-12-06 05:25:51 Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION