Re: Allow logical failover slots to wait on synchronous replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: John H <johnhyvr(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Allow logical failover slots to wait on synchronous replication
Date: 2024-09-16 09:25:42
Message-ID: CAJpy0uAexFn9=W2dTssxrHmAHu2V08Jq6BfFpjDxKUpuMdYKDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Sep 13, 2024 at 3:13 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Thu, Sep 12, 2024 at 3:04 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Sep 11, 2024 at 2:40 AM John H <johnhyvr(at)gmail(dot)com> wrote:
> > > >
> > > > Hi Shveta,
> > > >
> > > > On Sun, Sep 8, 2024 at 11:16 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > >
> > > > >
> > > > > I was trying to have a look at the patch again, it doesn't apply on
> > > > > the head, needs rebase.
> > > > >
> > > >
> > > > Rebased with the latest changes.
> > > >
> > > > > Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also
> > > > > does in a similar way. It gets mode in local var initially and uses it
> > > > > later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can
> > > > > change in between.
> > > > >
> > > > > [1]:
> > > > > mode = SyncRepWaitMode;
> > > > > .....
> > > > > ....
> > > > > if (!WalSndCtl->sync_standbys_defined ||
> > > > > lsn <= WalSndCtl->lsn[mode])
> > > > > {
> > > > > LWLockRelease(SyncRepLock);
> > > > > return;
> > > > > }
> > > >
> > > > You are right, thanks for the correction. I tried reproducing with GDB
> > > > where SyncRepWaitMode
> > > > changes due to pg_ctl reload but was unable to do so. It seems like
> > > > SIGHUP only sets ConfigReloadPending = true,
> > > > which gets processed in the next loop in WalSndLoop() and that's
> > > > probably where I was getting confused.
> > >
> > > yes, SIGHUP will be processed in the caller of
> > > StandbySlotsHaveCaughtup() (see ProcessConfigFile() in
> > > WaitForStandbyConfirmation()). So we can use 'SyncRepWaitMode'
> > > directly as it is not going to change in StandbySlotsHaveCaughtup()
> > > even if user triggers the change. And thus it was okay to use it even
> > > in the local variable too similar to how SyncRepWaitForLSN() does it.
> > >
> > > > In the latest patch, I've added:
> > > >
> > > > Assert(SyncRepWaitMode >= 0);
> > > >
> > > > which should be true since we call SyncRepConfigured() at the
> > > > beginning of StandbySlotsHaveCaughtup(),
> > > > and used SyncRepWaitMode directly.
> > >
> > > Yes, it should be okay I think. As SyncRepRequested() in the beginning
> > > makes sure synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH and
> > > thus SyncRepWaitMode should be mapped to either of
> > > WAIT_WRITE/FLUSH/APPLY etc. Will review further.
> > >
> >
> > I was wondering if we need somethign similar to SyncRepWaitForLSN() here:
> >
> > /* Cap the level for anything other than commit to remote flush only. */
> > if (commit)
> > mode = SyncRepWaitMode;
> > else
> > mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
> >
> > The header comment says:
> > * 'lsn' represents the LSN to wait for. 'commit' indicates whether this LSN
> > * represents a commit record. If it doesn't, then we wait only for the WAL
> > * to be flushed if synchronous_commit is set to the higher level of
> > * remote_apply, because only commit records provide apply feedback.
> >
> > If we don't do something similar, then aren't there chances that we
> > keep on waiting on the wrong lsn[mode] for the case when mode =
> > SYNC_REP_WAIT_APPLY while sync-rep-wait infrastructure is updating
> > different mode's lsn. Is my understanding correct?
> >
>
> I think here we always need the lsn values corresponding to
> SYNC_REP_WAIT_FLUSH as we want to ensure that the WAL has to be
> flushed on physical standby before sending it to the logical
> subscriber. See ProcessStandbyReplyMessage() where we always use
> flushPtr to advance the physical_slot via
> PhysicalConfirmReceivedLocation().

I agree. So even if the mode is SYNC_REP_WAIT_WRITE (lower one) or
SYNC_REP_WAIT_APPLY (higher one), we need to wait for
lsn[SYNC_REP_WAIT_FLUSH].

> Another question aside from the above point, what if someone has
> specified logical subscribers in synchronous_standby_names? In the
> case of synchronized_standby_slots, we won't proceed with such slots.
>

Yes, it is a possibility. I have missed this point earlier. Now I
tried a case where I give a mix of logical subscriber and physical
standby in 'synchronous_standby_names' on pgHead, it even takes that
'mix' configuration and starts waiting accordingly.

synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1,
phy_standby_2)';

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-09-16 09:27:37 Re: A starter task
Previous Message jian he 2024-09-16 09:22:29 Re: Virtual generated columns