From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Cc: | Ajin Cherian <itsajin(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Subject: | Re: Synchronizing slots from primary to standby |
Date: | 2024-01-24 03:21:53 |
Message-ID: | CAHut+Pu_uK==M+VmCMug7m7O6LAwpC05A=T7zP8c4G2-hS+bdg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Here are some comments for patch v66-0001.
======
doc/src/sgml/catalogs.sgml
1.
+ <para>
+ If true, the associated replication slots (i.e. the main slot and the
+ table sync slots) in the upstream database are enabled to be
+ synchronized to the physical standbys
+ </para></entry>
/physical standbys/physical standby/
I wondered if it is better just to say singular "standby" instead of
"standbys" in places like this; e.g. plural might imply cascading for
some readers.
There are a number of examples like this, so I've repeated the same
comment multiple times below. If you disagree, please just ignore all
of them.
======
doc/src/sgml/func.sgml
2.
that the decoding of prepared transactions is enabled for this
- slot. A call to this function has the same effect as the replication
- protocol command <literal>CREATE_REPLICATION_SLOT ...
LOGICAL</literal>.
+ slot. The optional fifth parameter,
+ <parameter>failover</parameter>, when set to true,
+ specifies that this slot is enabled to be synced to the
+ physical standbys so that logical replication can be resumed
+ after failover. A call to this function has the same effect as
+ the replication protocol command
+ <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
</para></entry>
(same as above)
/physical standbys/physical standby/
Also, I don't see anything else on this page using plural "standbys".
======
doc/src/sgml/protocol.sgml
3. CREATE_REPLICATION_SLOT
+ <varlistentry>
+ <term><literal>FAILOVER [ <replaceable
class="parameter">boolean</replaceable> ]</literal></term>
+ <listitem>
+ <para>
+ If true, the slot is enabled to be synced to the physical
+ standbys so that logical replication can be resumed after failover.
+ The default is false.
+ </para>
+ </listitem>
+ </varlistentry>
(same as above)
/physical standbys/physical standby/
~~~
4. ALTER_REPLICATION_SLOT
+ <variablelist>
+ <varlistentry>
+ <term><literal>FAILOVER [ <replaceable
class="parameter">boolean</replaceable> ]</literal></term>
+ <listitem>
+ <para>
+ If true, the slot is enabled to be synced to the physical
+ standbys so that logical replication can be resumed after failover.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
(same as above)
/physical standbys/physical standby/
======
doc/src/sgml/ref/create_subscription.sgml
5.
+ <varlistentry id="sql-createsubscription-params-with-failover">
+ <term><literal>failover</literal> (<type>boolean</type>)</term>
+ <listitem>
+ <para>
+ Specifies whether the replication slots associated with the
subscription
+ are enabled to be synced to the physical standbys so that logical
+ replication can be resumed from the new primary after failover.
+ The default is <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
(same as above)
/physical standbys/physical standby/
======
doc/src/sgml/system-views.sgml
6.
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>failover</structfield> <type>bool</type>
+ </para>
+ <para>
+ True if this is a logical slot enabled to be synced to the physical
+ standbys so that logical replication can be resumed from the new primary
+ after failover. Always false for physical slots.
+ </para></entry>
+ </row>
(same as above)
/physical standbys/physical standby/
======
src/backend/commands/subscriptioncmds.c
7.
+ if (IsSet(opts.specified_opts, SUBOPT_FAILOVER))
+ {
+ if (!sub->slotname)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot set failover for a subscription that does not have a
slot name")));
+
+ /*
+ * Do not allow changing the failover state if the
+ * subscription is enabled. This is because the failover
+ * state of the slot on the publisher cannot be modified if
+ * the slot is currently acquired by the apply worker.
+ */
+ if (sub->enabled)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot set %s for enabled subscription",
+ "failover")));
+
+ values[Anum_pg_subscription_subfailover - 1] =
+ BoolGetDatum(opts.failover);
+ replaces[Anum_pg_subscription_subfailover - 1] = true;
+ }
The first message is not consistent with the second. The "failover"
option maybe should be extracted so it won't be translated.
SUGGESTION
errmsg("cannot set %s for a subscription that does not have a slot
name", "failover")
~~~
8. AlterSubscription
+ if (!wrconn)
+ ereport(ERROR,
+ (errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("could not connect to the publisher: %s", err)));
+
Need to keep an eye on the patch proposed by Nisha [1] for messages
similar to this one, so in case that gets pushed this code should be
changed appropriately.
======
src/backend/replication/slot.c
9.
* during getting changes, if the two_phase option is enabled it can skip
* prepare because by that time start decoding point has been moved. So the
* user will only get commit prepared.
+ * failover: If enabled, allows the slot to be synced to physical standbys so
+ * that logical replication can be resumed after failover.
*/
(same as earlier)
/physical standbys/physical standby/
~~~
10.
+ /*
+ * Do not allow users to alter slots to enable failover on the standby
+ * as we do not support sync to the cascading standby.
+ */
+ if (RecoveryInProgress() && failover)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot alter replication slot to have failover"
+ " enabled on the standby"));
I felt the errmsg could be expressed with less ambiguity:
SUGGESTION:
cannot enable failover for a replication slot on the standby
======
src/backend/replication/slotfuncs.c
11. create_physical_replication_slot
/* acquire replication slot, this will check for conflicting names */
ReplicationSlotCreate(name, false,
- temporary ? RS_TEMPORARY : RS_PERSISTENT, false);
+ temporary ? RS_TEMPORARY : RS_PERSISTENT, false,
+ false);
Having an inline comment might be helpful here instead of passing "false,false"
SUGGESTION
ReplicationSlotCreate(name, false,
temporary ? RS_TEMPORARY : RS_PERSISTENT, false,
false /* failover */);
~~~
12. create_logical_replication_slot
+ /*
+ * Do not allow users to create the slots with failover enabled on the
+ * standby as we do not support sync to the cascading standby.
+ */
+ if (RecoveryInProgress() && failover)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot create replication slot with failover"
+ " enabled on the standby"));
(similar to previous comment)
SUGGESTION:
cannot enable failover for a replication slot created on the standby
~~~
13. copy_replication_slot
* hence pass find_startpoint false. confirmed_flush will be set
* below, by copying from the source slot.
+ *
+ * To avoid potential issues with the slotsync worker when the
+ * restart_lsn of a replication slot goes backwards, we set the
+ * failover option to false here. This situation occurs when a slot on
+ * the primary server is dropped and immediately replaced with a new
+ * slot of the same name, created by copying from another existing
+ * slot. However, the slotsync worker will only observe the restart_lsn
+ * of the same slot going backwards.
*/
create_logical_replication_slot(NameStr(*dst_name),
plugin,
temporary,
false,
+ false,
src_restart_lsn,
false);
(similar to an earlier comment)
Having an inline comment might be helpful here.
e.g. false /* failover */,
======
src/backend/replication/walreceiver.c
14.
- walrcv_create_slot(wrconn, slotname, true, false, 0, NULL);
+ walrcv_create_slot(wrconn, slotname, true, false, false, 0, NULL);
(similar to an earlier comment)
Having an inline comment might be helpful here:
SUGGESTION
walrcv_create_slot(wrconn, slotname, true, false, false /* failover
*/, 0, NULL);
======
src/backend/replication/walsender.c
15. CreateReplicationSlot
ReplicationSlotCreate(cmd->slotname, false,
cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT,
- false);
+ false, false);
(similar to an earlier comment)
Having an inline comment might be helpful here.
e.g. false /* failover */,
~~~
16. CreateReplicationSlot
+ /*
+ * Do not allow users to create the slots with failover enabled on the
+ * standby as we do not support sync to the cascading standby.
+ */
+ if (RecoveryInProgress() && failover)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot create replication slot with failover"
+ " enabled on the standby"));
+
/*
* Initially create persistent slot as ephemeral - that allows us to
* nicely handle errors during initialization because it'll get
@@ -1243,7 +1265,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
*/
ReplicationSlotCreate(cmd->slotname, true,
cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL,
- two_phase);
+ two_phase, failover);
This errmsg seems to be repeated in a few places, so I wondered if
this code can be refactored to call direct to
create_logical_replication_slot() so the errmsg can be just once in a
common place.
OTOH, if it cannot be refactored, then needs to be using same errmsg
as suggested by earlier review comments (see above).
======
src/include/catalog/pg_subscription.h
17.
+ bool subfailover; /* True if the associated replication slots
+ * (i.e. the main slot and the table sync
+ * slots) in the upstream database are enabled
+ * to be synchronized to the physical
+ * standbys. */
(same as earlier)
/physical standbys/physical standby/
~~~
18.
+ bool failover; /* True if the associated replication slots
+ * (i.e. the main slot and the table sync
+ * slots) in the upstream database are enabled
+ * to be synchronized to the physical
+ * standbys. */
(same as earlier)
/physical standbys/physical standby/
======
src/include/replication/slot.h
19.
+
+ /*
+ * Is this a failover slot (sync candidate for physical standbys)? Only
+ * relevant for logical slots on the primary server.
+ */
+ bool failover;
(same as earlier)
/physical standbys/physical standby/
======
[1] Nisha errmsg -
https://www.postgresql.org/message-id/CABdArM5-VR4Akt_AHap_0Ofne0cTcsdnN6FcNe%2BMU8eXsa_ERQ%40mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2024-01-24 04:09:01 | Re: Synchronizing slots from primary to standby |
Previous Message | Bharath Rupireddy | 2024-01-24 02:21:38 | Re: Add code indentation check to cirrus-ci (was Re: Add BF member koel-like indentation checks to SanityCheck CI) |