Re: promotion related handling in pg_sync_replication_slots()

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: promotion related handling in pg_sync_replication_slots()
Date: 2024-04-12 11:54:50
Message-ID: CAJpy0uB=X1FrW-si-d0M5kz3ocR0rnHNSEXmw3ZtVsOkBEub+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 12, 2024 at 4:18 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Apr 12, 2024 at 7:57 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Sat, Apr 6, 2024 at 12:25 PM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Apr 5, 2024 at 10:31 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > >
> > > > Please find v2. Changes are:
> > >
> > > Thanks for the patch. Here are some comments.
> >
> > Thanks for reviewing.
> > >
> > > 1. Can we have a clear saying in the shmem variable who's syncing at
> > > the moment? Is it a slot sync worker or a backend via SQL function?
> > > Perhaps turn "bool syncing;" to "SlotSyncSource sync_source;"
> > >
> > > typedef enum SlotSyncSource
> > > {
> > > SLOT_SYNC_NONE,
> > > SLOT_SYNC_WORKER,
> > > SLOT_SYNC_BACKEND,
> > > } SlotSyncSource;
> > >
> > > Then, the check in ShutDownSlotSync can be:
> > >
> > > + /*
> > > + * Return if neither the slot sync worker is running nor the function
> > > + * pg_sync_replication_slots() is executing.
> > > + */
> > > + if ((SlotSyncCtx->pid == InvalidPid) &&
> > > SlotSyncCtx->sync_source != SLOT_SYNC_BACKEND)
> > > {
> > >
>
> I don't know if this will be help, especially after fixing the race
> condition I mentioned. But otherwise, also, at this stage it doesn't
> seem helpful to add the source of sync explicitly.
>

Agreed.

Please find v3 addressing race-condition and one other comment.

Up-thread it was suggested that, probably, checking
SlotSyncCtx->syncing should be sufficient in ShutDownSlotSync(). On
re-thinking, it might not be. Slot sync worker sets and resets
'syncing' with each sync-cycle, and thus we need to rely on worker's
pid in ShutDownSlotSync(), as there could be a window where promotion
is triggered and 'syncing' is not set for worker, while the worker is
still running. This implementation of setting and resetting syncing
with each sync-cycle looks better as compared to setting syncing
during the entire life-cycle of the worker. So, I did not change it.

To fix the race condition, I moved the setting of the 'syncing' flag
together with the 'stopSignaled' check under the same spinLock for the
SQL function. OTOH, for worker, I feel it is good to check
'stopSignaled' at the beginning itself, while retaining the
setting/resetting of 'syncing' at a later stage during the actual sync
cycle. This makes handling for SQL function and worker slightly
different. And thus to achieve this, I had to take the 'syncing' flag
handling out of synchronize_slots() and move it to both worker and SQL
function by introducing 2 new functions check_and_set_syncing_flag()
and reset_syncing_flag().
I am analyzing if there are better ways to achieve this, any
suggestions are welcome.

thanks
Shveta

Attachment Content-Type Size
v3-0001-Handle-stopSignaled-during-sync-function-call.patch application/octet-stream 6.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-04-12 12:12:34 Re: post-freeze damage control
Previous Message Etsuro Fujita 2024-04-12 11:21:24 Re: Another WaitEventSet resource leakage in back branches