Re: Allow logical failover slots to wait on synchronous replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: John H <johnhyvr(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Allow logical failover slots to wait on synchronous replication
Date: 2024-08-29 09:34:57
Message-ID: CAJpy0uBqeWB9m9nyewWJBLVvXAsKU8mmG3Bw-vB0gLLY2iWWfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 29, 2024 at 2:31 AM John H <johnhyvr(at)gmail(dot)com> wrote:
>
> Hi Amit,
>
> On Mon, Aug 26, 2024 at 11:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > I wanted a simple test where in the first case you use
> > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' and in the second case
> > use standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot. You
> > can try some variations of it as well. The idea is that even if the
> > performance is less for synchronous_standby_names configuration, we
> > should be able to document it. This will help users to decide what is
> > ...
> > What is the difference between "Test failover_slots with
> > synchronized_standby_slots = 'rr_1, rr_2,
> > > rr_3, rr_4, rr_5'" and "Test failover_slots waiting on sync_rep no new shared cache"? I want to know what configuration did you used for synchronous_standby_names in the latter case.
>
> Sorry for the confusion due to the bad-naming of the test cases, let
> me rephrase.
> All three tests had synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> set with synchronous_commit = 'on', and failover_slots = 'on'
> for the 10 logical slots.
>
> # Test failover_slots with synchronized_standby_slots = 'rr_1, rr_2,
> rr_3, rr_4, rr_5'
> This is the test you wanted where the logical clients are waiting on
> all 5 slots to acknowledge the change since
> 'synchronized_standby_slots' takes priority when set.
>
> # Test failover_slots sync rep no cache
> This test has 'synchronized_standby_slots' commented out, and without
> relying on the new cache introduced in 0003.
> Logical clients will wait on synchronous_standby_names in this case.
>
> # Test failover slots with additional shared cache
> This test also has 'synchronized_standby_slots' commented out, and
> logical clients will wait on the LSNs
> reported from synchronous_standby_names but it relies on a new cache
> to reduce contention on SyncRepLock.
>
> > The idea is that even if the
> > performance is less for synchronous_standby_names configuration, we
> > should be able to document it. This will help users to decide what is
> > best for them.
>
> Makes sense.
>
> > I am also not sure especially as the test results didn't shown much
> > improvement and the code also becomes bit complicated. BTW, in the
> > 0003 version in the below code:
>
> That's fair, I've updated to be more in line with 0002.
>
> > + /* Cache values to reduce contention */
> > + LWLockAcquire(SyncRepLock, LW_SHARED);
> > + memcpy((XLogRecPtr *) walsndctl->cached_lsn, (XLogRecPtr *)
> > walsndctl->lsn, sizeof(lsn));
> > + LWLockRelease(SyncRepLock);
> >
> > Which mode lsn is being copied? I am not sure if I understood this
> > part of the code.
>
> All of the mode LSNs are being copied in case SyncRepWaitMode changes in
> the next iteration. I've removed that part but kept:
>
> > + memcpy(lsn, (XLogRecPtr *) walsndctl->lsn, sizeof(lsn));
>
> as suggested by Bertrand to avoid the for loop updating values one-by-one.
>
> Here's what's logged after the memcpy:
>
> 2024-08-28 19:41:13.798 UTC [1160413] LOG: lsn[0] after memcpy is: 279/752C7FF0
> 2024-08-28 19:41:13.798 UTC [1160413] LOG: lsn[1] after memcpy is: 279/752C7F20
> 2024-08-28 19:41:13.798 UTC [1160413] LOG: lsn[2] after memcpy is: 279/752C7F20
>
> > In the 0002 version, in the following code [1], you are referring to
> > LSN mode which is enabled for logical walsender irrespective of the
> > mode used by the physical walsender. It is possible that they are
> > always the same but that is not evident from the code or comments in
> > the patch.
>
> They are almost always the same, I tried to indicate that with the
> following comment in the patch, but I could make it more explicit?
> > /* Initialize value in case SIGHUP changing to SYNC_REP_NO_WAIT */
>
> At the beginning we set
>
> > int mode = SyncRepWaitMode;
>
> At this time, the logical walsender mode it's checking against is the
> same as what the physical walsenders are using.
> It's possible that this mode is no longer the same when we execute the
> following check:
>
> > if (lsn[mode] >= wait_for_lsn)
>
> because of a SIGHUP to synchronous_commit that changes SyncRepWaitMode
> to some other value
>
> We cache the value instead of
> > if (lsn[SyncRepWaitMode] >= wait_for_lsn)
>
> because SYNC_REP_NO_WAIT is -1. If SyncRepWaitMode is set to this it
> leads to out of bounds access.
>
> I've attached a new patch that removes the shared cache introduced in 0003.
>

Thanks for the patch. Few comments and queries:

1)
+ static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE];

We shall name it as 'lsns' as there are multiple.

2)

+ for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+ {
+ lsn[i] = InvalidXLogRecPtr;
+ }

Can we do it like below similar to what you have done at another place:
memset(lsn, InvalidXLogRecPtr, sizeof(lsn));

3)
+ if (!initialized)
+ {
+ for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+ {
+ lsn[i] = InvalidXLogRecPtr;
+ }
+ }

I do not see 'initialized' set to TRUE anywhere. Can you please
elaborate the intent here?

4)
+ int mode = SyncRepWaitMode;
+ int i;
+
+ if (!initialized)
+ {
+ for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+ {
+ lsn[i] = InvalidXLogRecPtr;
+ }
+ }
+ if (mode == SYNC_REP_NO_WAIT)
+ return true;

I do not understand this code well. As Amit also pointed out, 'mode'
may change. When we initialize 'mode' lets say SyncRepWaitMode is
SYNC_REP_NO_WAIT but by the time we check 'if (mode ==
SYNC_REP_NO_WAIT)', SyncRepWaitMode has changed to say
SYNC_REP_WAIT_FLUSH, if so, then we will wrongly return true from
here. Is that a possibility? ProcessConfigFile() is in the caller, and
thus we may end up using the wrong mode.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-08-29 09:40:14 Make printtup a bit faster
Previous Message Amit Kapila 2024-08-29 09:21:36 Re: Add contrib/pg_logicalsnapinspect