Re: Allow logical failover slots to wait on synchronous replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: John H <johnhyvr(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Allow logical failover slots to wait on synchronous replication
Date: 2024-09-16 05:43:24
Message-ID: CAA4eK1K2NUovjv+HEM1fnY1zyiMAJXYQSKcsD1ihUSAYX_XJYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 13, 2024 at 3:13 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Sep 12, 2024 at 3:04 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Wed, Sep 11, 2024 at 2:40 AM John H <johnhyvr(at)gmail(dot)com> wrote:
> > >
> > > Hi Shveta,
> > >
> > > On Sun, Sep 8, 2024 at 11:16 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > > >
> > > > I was trying to have a look at the patch again, it doesn't apply on
> > > > the head, needs rebase.
> > > >
> > >
> > > Rebased with the latest changes.
> > >
> > > > Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also
> > > > does in a similar way. It gets mode in local var initially and uses it
> > > > later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can
> > > > change in between.
> > > >
> > > > [1]:
> > > > mode = SyncRepWaitMode;
> > > > .....
> > > > ....
> > > > if (!WalSndCtl->sync_standbys_defined ||
> > > > lsn <= WalSndCtl->lsn[mode])
> > > > {
> > > > LWLockRelease(SyncRepLock);
> > > > return;
> > > > }
> > >
> > > You are right, thanks for the correction. I tried reproducing with GDB
> > > where SyncRepWaitMode
> > > changes due to pg_ctl reload but was unable to do so. It seems like
> > > SIGHUP only sets ConfigReloadPending = true,
> > > which gets processed in the next loop in WalSndLoop() and that's
> > > probably where I was getting confused.
> >
> > yes, SIGHUP will be processed in the caller of
> > StandbySlotsHaveCaughtup() (see ProcessConfigFile() in
> > WaitForStandbyConfirmation()). So we can use 'SyncRepWaitMode'
> > directly as it is not going to change in StandbySlotsHaveCaughtup()
> > even if user triggers the change. And thus it was okay to use it even
> > in the local variable too similar to how SyncRepWaitForLSN() does it.
> >
> > > In the latest patch, I've added:
> > >
> > > Assert(SyncRepWaitMode >= 0);
> > >
> > > which should be true since we call SyncRepConfigured() at the
> > > beginning of StandbySlotsHaveCaughtup(),
> > > and used SyncRepWaitMode directly.
> >
> > Yes, it should be okay I think. As SyncRepRequested() in the beginning
> > makes sure synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH and
> > thus SyncRepWaitMode should be mapped to either of
> > WAIT_WRITE/FLUSH/APPLY etc. Will review further.
> >
>
> I was wondering if we need somethign similar to SyncRepWaitForLSN() here:
>
> /* Cap the level for anything other than commit to remote flush only. */
> if (commit)
> mode = SyncRepWaitMode;
> else
> mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
>
> The header comment says:
> * 'lsn' represents the LSN to wait for. 'commit' indicates whether this LSN
> * represents a commit record. If it doesn't, then we wait only for the WAL
> * to be flushed if synchronous_commit is set to the higher level of
> * remote_apply, because only commit records provide apply feedback.
>
> If we don't do something similar, then aren't there chances that we
> keep on waiting on the wrong lsn[mode] for the case when mode =
> SYNC_REP_WAIT_APPLY while sync-rep-wait infrastructure is updating
> different mode's lsn. Is my understanding correct?
>

I think here we always need the lsn values corresponding to
SYNC_REP_WAIT_FLUSH as we want to ensure that the WAL has to be
flushed on physical standby before sending it to the logical
subscriber. See ProcessStandbyReplyMessage() where we always use
flushPtr to advance the physical_slot via
PhysicalConfirmReceivedLocation().

Another question aside from the above point, what if someone has
specified logical subscribers in synchronous_standby_names? In the
case of synchronized_standby_slots, we won't proceed with such slots.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-09-16 06:00:00 Re: Robocopy might be not robust enough for never-ending testing on Windows
Previous Message Tom Lane 2024-09-16 05:09:24 Re: Regression tests fail with tzdata 2024b