Re: Synchronizing slots from primary to standby

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-03-05 00:40:27
Message-ID: CAHut+PsATK8z1TEcfFE8zWoS1hagqsvaWYCgom_zYtScfwO7uQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for v105-0001

==========
doc/src/sgml/config.sgml

1.
+ <para>
+ The standbys corresponding to the physical replication slots in
+ <varname>standby_slot_names</varname> must configure
+ <literal>sync_replication_slots = true</literal> so they can receive
+ logical failover slots changes from the primary.
+ </para>

/slots changes/slot changes/

======
doc/src/sgml/func.sgml

2.
+ The function may be waiting if the specified slot is a logical failover
+ slot and <link
linkend="guc-standby-slot-names"><varname>standby_slot_names</varname></link>
+ is configured.

I know this has been through multiple versions already, but this
latest wording "may be waiting..." doesn't seem very good to me.

How about one of these?

* The function may not be able to return immediately if the specified
slot is a logical failover slot and standby_slot_names is configured.

* The function return might be blocked if the specified slot is a
logical failover slot and standby_slot_names is configured.

* If the specified slot is a logical failover slot then the function
will block until all physical slots specified in standby_slot_names
have confirmed WAL receipt.

* If the specified slot is a logical failover slot then the function
will not return until all physical slots specified in
standby_slot_names have confirmed WAL receipt.

~~~

3.
+ slot may return to an earlier position. The function may be waiting if
+ the specified slot is a logical failover slot and
+ <link linkend="guc-standby-slot-names"><varname>standby_slot_names</varname></link>

Same as previous review comment #2

======
src/backend/replication/slot.c

4. WaitForStandbyConfirmation

+ * Used by logical decoding SQL functions that acquired logical failover slot.

IIUC it doesn't work like that. pg_logical_slot_get_changes_guts()
calls here unconditionally (i.e. the SQL functions don't even check if
they are failover slots before calling this) so the comment seems
misleading/redundant.

======
src/backend/replication/walsender.c

5. NeedToWaitForWal

+ /*
+ * Check if the standby slots have caught up to the flushed position. It
+ * is good to wait up to the flushed position and then let the WalSender
+ * send the changes to logical subscribers one by one which are already
+ * covered by the flushed position without needing to wait on every change
+ * for standby confirmation.
+ */
+ if (NeedToWaitForStandbys(flushed_lsn, wait_event))
+ return true;
+
+ *wait_event = 0;
+ return false;
+}
+

5a.
The comment (or part of it?) seems misplaced because it is talking
WalSender sending changes, but that is not happening in this function.

Also, isn't what this is saying already described by the other comment
in the caller? e.g.:

+ /*
+ * Within the loop, we wait for the necessary WALs to be flushed to
+ * disk first, followed by waiting for standbys to catch up if there
+ * are enough WALs or upon receiving the shutdown signal. To avoid the
+ * scenario where standbys need to catch up to a newer WAL location in
+ * each iteration, we update our idea of the currently flushed
+ * position only if we are not waiting for standbys to catch up.
+ */

~

5b.
Most of the code is unnecessary. AFAICT all this is exactly same as just 1 line:

return NeedToWaitForStandbys(flushed_lsn, wait_event);

~~~

6. WalSndWaitForWal

+ /*
+ * Within the loop, we wait for the necessary WALs to be flushed to
+ * disk first, followed by waiting for standbys to catch up if there
+ * are enough WALs or upon receiving the shutdown signal. To avoid the
+ * scenario where standbys need to catch up to a newer WAL location in
+ * each iteration, we update our idea of the currently flushed
+ * position only if we are not waiting for standbys to catch up.
+ */

Regarding that 1st sentence: maybe this logic used to be done
explicitly "within the loop" but IIUC this logic is now hidden inside
NeedToWaitForWal() so the comment should mention that.

----------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-03-05 00:42:20 Re: Fix race condition in InvalidatePossiblyObsoleteSlot()
Previous Message Jeff Davis 2024-03-04 23:32:19 Re: [PATCH] updates to docs about HOT updates for BRIN