From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Subject: | Re: Synchronizing slots from primary to standby |
Date: | 2024-03-02 23:46:59 |
Message-ID: | CAHut+Psgm2NhzJCi7ZxhpdnuDG+tY34DpnrbXBiAHt1Oa=2Sew@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Mar 2, 2024 at 2:51 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Friday, March 1, 2024 12:23 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
...
> > ======
> > src/backend/replication/slot.c
> >
> > 2. validate_standby_slots
> >
> > + else if (!ReplicationSlotCtl)
> > + {
> > + /*
> > + * We cannot validate the replication slot if the replication slots'
> > + * data has not been initialized. This is ok as we will validate the
> > + * specified slot when waiting for them to catch up. See
> > + * StandbySlotsHaveCaughtup for details.
> > + */
> > + }
> > + else
> > + {
> > + /*
> > + * If the replication slots' data have been initialized, verify if the
> > + * specified slots exist and are logical slots.
> > + */
> > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> >
> > IMO that 2nd comment does not need to say "If the replication slots'
> > data have been initialized," because that is implicit from the if/else.
>
> Removed.
I only meant to suggest removing the 1st part, not the entire comment.
I thought it is still useful to have a comment like:
/* Check that the specified slots exist and are logical slots.*/
======
And, here are some review comments for v103-0001.
======
Commit message
1.
Additionally, The SQL functions pg_logical_slot_get_changes,
pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
to wait for the replication slots mentioned in 'standby_slot_names' to
catch up before returning.
~
(use the same wording as previous in this message)
/mentioned in/specified in/
======
doc/src/sgml/config.sgml
2.
+ Additionally, when using the replication management functions
+ <link linkend="pg-replication-slot-advance">
+ <function>pg_replication_slot_advance</function></link>,
+ <link linkend="pg-logical-slot-get-changes">
+ <function>pg_logical_slot_get_changes</function></link>, and
+ <link linkend="pg-logical-slot-peek-changes">
+ <function>pg_logical_slot_peek_changes</function></link>,
+ with failover enabled logical slots, the functions will wait for the
+ physical slots specified in <varname>standby_slot_names</varname> to
+ confirm WAL receipt before proceeding.
+ </para>
It says "the ... functions" twice so maybe reword it slightly.
BEFORE
Additionally, when using the replication management functions
pg_replication_slot_advance, pg_logical_slot_get_changes, and
pg_logical_slot_peek_changes, with failover enabled logical slots, the
functions will wait for the physical slots specified in
standby_slot_names to confirm WAL receipt before proceeding.
SUGGESTION
Additionally, the replication management functions
pg_replication_slot_advance, pg_logical_slot_get_changes, and
pg_logical_slot_peek_changes, when used with failover enabled logical
slots, will wait for the physical slots specified in
standby_slot_names to confirm WAL receipt before proceeding.
(Actually the "will wait ... before proceeding" is also a bit tricky,
so below is another possible rewording)
SUGGESTION #2
Additionally, the replication management functions
pg_replication_slot_advance, pg_logical_slot_get_changes, and
pg_logical_slot_peek_changes, when used with failover enabled logical
slots, will block until all physical slots specified in
standby_slot_names have confirmed WAL receipt.
~~~
3.
+ <note>
+ <para>
+ Value <literal>*</literal> is not accepted as it is inappropriate to
+ block logical replication for physical slots that either lack
+ associated standbys or have standbys associated that are not enabled
+ for replication slot synchronization. (see
+ <xref linkend="logicaldecoding-replication-slots-synchronization"/>).
+ </para>
+ </note>
Why does the document need to provide an excuse/reason for the rule?
You could just say something like:
SUGGESTION
The slots must be named explicitly. For example, specifying wildcard
values like <literal>*</literal> is not permitted.
======
doc/src/sgml/func.sgml
4.
@@ -28150,7 +28150,7 @@ postgres=# SELECT '0/0'::pg_lsn +
pd.segment_number * ps.setting::int + :offset
</row>
<row>
- <entry role="func_table_entry"><para role="func_signature">
+ <entry id="pg-logical-slot-get-changes"
role="func_table_entry"><para role="func_signature">
<indexterm>
<primary>pg_logical_slot_get_changes</primary>
</indexterm>
@@ -28177,7 +28177,7 @@ postgres=# SELECT '0/0'::pg_lsn +
pd.segment_number * ps.setting::int + :offset
</row>
<row>
- <entry role="func_table_entry"><para role="func_signature">
+ <entry id="pg-logical-slot-peek-changes"
role="func_table_entry"><para role="func_signature">
<indexterm>
<primary>pg_logical_slot_peek_changes</primary>
</indexterm>
@@ -28232,7 +28232,7 @@ postgres=# SELECT '0/0'::pg_lsn +
pd.segment_number * ps.setting::int + :offset
</row>
<row>
- <entry role="func_table_entry"><para role="func_signature">
+ <entry id="pg-replication-slot-advance"
role="func_table_entry"><para role="func_signature">
<indexterm>
<primary>pg_replication_slot_advance</primary>
</indexterm>
Should these 3 functions say something about how their behaviour is
affected by 'standby_slot_names' and give a link back to the GUC
'standby_slot_names' docs?
======
src/backend/replication/slot.c
5. StandbySlotsHaveCaughtup
+ if (!slot)
+ {
+ /*
+ * If the provided slot does not exist, report a message and exit
+ * the loop. It is possible for a user to specify a slot name in
+ * standby_slot_names that does not exist just before the server
+ * startup. The GUC check_hook(validate_standby_slots) cannot
+ * validate such a slot during startup as the ReplicationSlotCtl
+ * shared memory is not initialized at that time. It is also
+ * possible for a user to drop the slot in standby_slot_names
+ * afterwards.
+ */
5a.
Minor rewording to make this code comment more similar to the next one:
SUGGESTION
If a slot name provided in standby_slot_names does not exist, report a
message and exit the loop. A user can specify a slot name that does
not exist just before the server startup...
~
5b.
+ /*
+ * If a logical slot name is provided in standby_slot_names,
+ * report a message and exit the loop. Similar to the non-existent
+ * case, it is possible for a user to specify a logical slot name
+ * in standby_slot_names before the server startup, or drop an
+ * existing physical slot and recreate a logical slot with the
+ * same name.
+ */
/it is possible for a user to specify/a user can specify/
~~~
6. WaitForStandbyConfirmation
+ /*
+ * We wait for the slots in the standby_slot_names to catch up, but we
+ * use a timeout (1s) so we can also check if the standby_slot_names
+ * has been changed.
+ */
Remove some of the "we".
SUGGESTION
Wait for the slots in the standby_slot_names to catch up, but use a
timeout (1s) so we can also check if the standby_slot_names has been
changed.
======
src/backend/replication/walsender.c
7. NeedToWaitForStandby
+/*
+ * Returns true if not all standbys have caught up to the flushed position
+ * (flushed_lsn) when failover_slot is true; otherwise, returns false.
+ *
+ * If returning true, the function sets the appropriate wait event in
+ * wait_event; otherwise, wait_event is set to 0.
+ */
The function comment refers to 'failover_slot' but IMO needs to be
worded differently because 'failover_slot' is not a parameter anymore.
~~~
8. NeedToWaitForWal
+/*
+ * Returns true if we need to wait for WALs to be flushed to disk, or if not
+ * all standbys have caught up to the flushed position (flushed_lsn) when
+ * failover_slot is true; otherwise, returns false.
+ *
+ * If returning true, the function sets the appropriate wait event in
+ * wait_event; otherwise, wait_event is set to 0.
+ */
+static bool
+NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
+ uint32 *wait_event)
Same as above -- Now that 'failover_slot' is not a function parameter,
I thought this should be reworded.
~~~
9. NeedToWaitForWal
+ /*
+ * Check if the standby slots have caught up to the flushed position. It
+ * is good to wait up to flushed position and then let it send the changes
+ * to logical subscribers one by one which are already covered in flushed
+ * position without needing to wait on every change for standby
+ * confirmation. Note that after receiving the shutdown signal, an ERROR
+ * is reported if any slots are dropped, invalidated, or inactive. This
+ * measure is taken to prevent the walsender from waiting indefinitely.
+ */
+ if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event))
+ return true;
I felt it was confusing things for this function to also call to the
other one -- it seems an overlapping/muddling of the purpose of these.
I think it will be easier to understand if the calling code just calls
to one or both of these functions as required.
~~~
10.
- WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL);
+ WalSndWait(wakeEvents, sleeptime, wait_event);
Tracing the assignments of the 'wait_event' is a bit tricky... IIUC it
should not be 0 when we got to this point, so maybe it is better to
put Assert(wait_event) before this call?
----------
Kind Regards,
Peter Smith.
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2024-03-02 23:59:19 | Re: BitmapHeapScan streaming read user and prelim refactoring |
Previous Message | Melanie Plageman | 2024-03-02 23:39:31 | Re: BitmapHeapScan streaming read user and prelim refactoring |