RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2023-11-17 01:46:07
Message-ID: OS0PR01MB57160CC95DAD0E3B6D60DBAB94B7A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, November 14, 2023 10:27 PM Drouvot, Bertrand <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> On 11/13/23 2:57 PM, Zhijie Hou (Fujitsu) wrote:
> > On Friday, November 10, 2023 4:16 PM Drouvot, Bertrand
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >> Yeah good point, agree to just error out in all the case then (if we
> >> discard the sync_ reserved wording proposal, which seems to be the
> >> case as probably not worth the extra work).
> >
> > Thanks for the discussion!
> >
> > Here is the V33 patch set which includes the following changes:
>
> Thanks for working on it!
>
> >
> > 1) Drop slots with state 'i' in promotion flow after we shut down WalReceiver.
>
> @@ -3557,10 +3558,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr
> RecPtr, bool randAccess,
> * this only after failure, so when you promote, we still
> * finish replaying as much as we can from archive and
> * pg_wal before failover.
> + *
> + * Drop the slots for which sync is initiated but not yet
> + * completed i.e. they are still waiting for the primary
> + * server to catch up.
> */
> if (StandbyMode && CheckForStandbyTrigger())
> {
> XLogShutdownWalRcv();
> + slotsync_drop_initiated_slots();
> return XLREAD_FAIL;
> }
>
> I had a closer look and it seems this is not located at the right place.
>
> Indeed, it's added here:
>
> switch (currentSource)
> {
> case XLOG_FROM_ARCHIVE:
> case XLOG_FROM_PG_WAL:
>
> While in our case we are in
>
> case XLOG_FROM_STREAM:
>
> So I think we should move slotsync_drop_initiated_slots() in the
> XLOG_FROM_STREAM case. Maybe before shutting down the sync slot worker?
> (the TODO item number 2 you mentioned up-thread)

Thanks for the comment.

I feel the WaitForWALToBecomeAvailable may not be the best place to shutdown
slotsync worker and drop slots. There could be other reasons(other than
promotion) as mentioned in comments in case XLOG_FROM_STREAM to reach the code
there. I thought if the intention is to stop slotsync workers on promotion,
maybe FinishWalRecovery() is a better place to do it as it's indicating the end
of recovery and XLogShutdownWalRcv is also called in it.

And I feel we'd better drop the slots after shutting down the slotsync workers,
because otherwise the slotsync workers could create the dropped slot again in
rare cases.

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2023-11-17 02:09:10 Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
Previous Message Jubilee Young 2023-11-17 01:11:03 Re: Hide exposed impl detail of wchar.c