Re: Handle infinite recursion in logical replication setup

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-05-26 08:36:41
Message-ID: CAHut+PvyfZ-YE++QGwmTgZ_ZvEQ79VH1-Z0sKAMGG_JUbfGDrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This post completes my review of patch v15*.

======

1. A general comment affecting both patches

I realised the things defined in the subscription WITH are referred to
in the PG DOCS as "parameters" (not "options"). See CREATE
SUBSCRIPTION [1]. This is already reflected in some of my review
comments for the v15-0002 below., but I'd already posted my v15-0001
review comments before noticing this.

Please also go through the v15-0001 patch to make all the necessary
terminology changes (including in the v15-0001 commit message).

=====

Please find below review comments for patch v15-0002.

2. Commit message

This patch does couple of things:
change 1) Added force option for copy_data.
change 2) Check and throw an error if the publication tables were also
subscribing data in the publisher from other publishers.

--

2a.
"couple" -> "a couple"
"Added force" -> "Adds force ..."
"Check and throw" -> "Checks and throws ..."

2b.
Isn't "Change 2)" only applicable when copy_data and only_local are
on? It should say so here.

~~~

3. Commit message

node1: Table t1 (c1 int) has data 1, 2, 3, 4
node2: Table t1 (c1 int) has data 5, 6, 7, 8

--

I feel these kinds of replication examples are always much easier to
visualize when the data somehow indicates where it came from. E.g.
consider using different data in your example like:

node1: Table t1 (c1 int) has data 11, 12, 13, 14
node2: Table t1 (c1 int) has data 21, 22, 23, 24

~~~

4. Commit message

Examples in this message exceed 80 chars. Please wrap the SQL appropriately

~~~

5. Commit message

After this the data will be something like this:
node1:
1, 2, 3, 4, 5, 6, 7, 8

node2:
1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8

So, you can see that data on node2 (5, 6, 7, 8) is duplicated.

--

Yeah, but is that entirely correct though? AFAIK the scenario
described by this example is going to recurse forever.

~~~

6. Commit message

This problem can be solved by using only_local and copy_data option as
given below

--

"option" -> "parameters"

~~~

7. Commit message

There are other minor review comments but it's too messy to report in
detail here. I suggest checking all the ":" (some were missing) and
the letter case of/within the sentences. I also suggest
cutting/pasting all this text into MSWord or some grammar checker to
correct anything else reported.

~~~

8. Commit message

CREATE SUBSCRIPTION sub_node2_node1 CONNECTION '<node1 details>'
PUBLICATION pub_node1 WITH (copy_data = force, only_local = on);
ERROR: CREATE/ALTER SUBSCRIPTION with only_local and copy_data as
true is not allowed when the publisher might have replicated data

--

Looks like a typo: e.g. If copy_data is "force" then how is the error happening?

~~~

9. Commit message

The examples seem confused/muddled in some way IMO.

IIUC the new ERROR is like a new integrity check for when the
subscription is using the only_local = on, along with copy_data = on.

And so then the copy_data = force was added as a way to override that
previous error being reported.

So unless you know about the possible error then the "force" option
makes no sense. So I think you need to change the order of the
explanations in the commit message to describe the new error first.

======

10. doc/src/sgml/logical-replication.sgml

+
+ <sect1 id="bidirectional-logical-replication">
+ <title>Bidirectional logical replication</title>
+
+ <sect2 id="setting-bidirectional-replication-two-nodes">
+ <title>Setting bidirectional replication between two nodes</title>
+ <para>
+ Bidirectional replication is useful in creating a multi-master database
+ which helps in performing read/write operations from any of the nodes.
+ Setting up bidirectional logical replication between two nodes requires
+ creation of a publication in all the nodes, creating subscriptions in
+ each of the nodes that subscribes to data from all the nodes. The steps
+ to create a two-node bidirectional replication when there is no data in
+ both the nodes are given below:
+ </para>

This description is confusing. It seems to be trying to say something
about a multi-master system (e.g. you say "all the nodes") yet this is
all written inside the section about "... two nodes", so saying "all
the nodes" makes no sense here.

I think you need to split this information and put some kind of blurb
text at the top level (section 31.11), and then this section (for "two
nodes") should ONLY be talking about the case when there are just TWO
nodes.

~~~

11. doc/src/sgml/logical-replication.sgml

+ <para>
+ Lock the required tables in <literal>node1</literal> and
+ <literal>node2</literal> till the setup is completed.
+ </para>

"till" -> "until" (make this same change in multiple places)

~~~

12. doc/src/sgml/logical-replication.sgml

+ <sect2 id="add-new-node-data-in-existing-node">
+ <title>Adding a new node when data is present in the existing nodes</title>
+ <para>
+ Adding a new node <literal>node3</literal> to the existing
+ <literal>node1</literal> and <literal>node2</literal> when data is present
+ in existing nodes <literal>node1</literal> and <literal>node2</literal>
+ needs similar steps. The only change required here is that
+ <literal>node3</literal> should create a subscription with
+ <literal>copy_data = force</literal> to one of the existing nodes to
+ receive the existing data during initial data synchronization.
+ </para>

I think perhaps this should clarify that there is NO data in node3 for
this example.

~~~

13. doc/src/sgml/logical-replication.sgml - section 31.11.3

+ <para>
+ Lock the required tables in <literal>node2</literal> and
+ <literal>node3</literal> till the setup is completed.
+ </para>

Should you describe why you say no lock is required for tables on
node1? (Similar information is in section 31.11.4 also)

~~~

14. doc/src/sgml/logical-replication.sgml - section 31.11.3

+ <para>
+ Create a subscription in <literal>node3</literal> to subscribe to
+ <literal>node2</literal>:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node2
+node3-# CONNECTION 'dbname=foo host=node2 user=repuser'
+node3-# PUBLICATION pub_node2
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>

Using a similar description as in section 31.11.4 this should probably
also say something like:
"Use copy_data specified as off because the initial table data would
have been already copied in the previous step".

~~~

15. doc/src/sgml/logical-replication.sgml - section 31.11.4

+ <sect2 id="add-node-data-present-in-new-node">
+ <title>Adding a new node when data is present in the new node</title>
+ <para>
+ Adding a new node <literal>node3</literal> to the existing
+ <literal>node1</literal> and <literal>node2</literal> when data is present
+ in the new node <literal>node3</literal> needs similar steps. A
few changes
+ are required here to get the existing data from <literal>node3</literal>
+ to <literal>node1</literal> and <literal>node2</literal> and later
+ cleaning up of data in <literal>node3</literal> before synchronization of
+ all the data from the existing nodes.
+ </para>

I think perhaps this should clarify that there is NO data in node1 or
node2 for this example.

~~~

16. doc/src/sgml/logical-replication.sgml

+ <sect2 id="generic-steps-add-new-node">
+ <title>Generic steps to add a new node to the existing set of nodes</title>

I think all the steps in this section should be numbered because the
order is important.

~~~

17. doc/src/sgml/logical-replication.sgml

+ <sect2 id="generic-steps-add-new-node">
+ <title>Generic steps to add a new node to the existing set of nodes</title>

IIUC the word "parameter" should replace the word "option" in all the
text of this section.

~~~

18. doc/src/sgml/logical-replication.sgml - Notes

+ <sect2>
+ <title>Notes</title>
+ <para>
+ Setting up bidirectional logical replication across nodes requires multiple
+ steps to be performed on various nodes, as all operations are not
+ transactional, user is advised to take backup of existing data to avoid any
+ inconsistency.
+ </para>
+ </sect2>

18a.
I thought this "Notes" section should be rendered more like an sgml
note or a warning. E.g. like the warning on this page [2].

18b.
SUGGESTION (split the para into multiple sentences). e.g.
Setting up bidirectional logical replication across nodes requires
multiple steps to be performed on various nodes. Because not all
operations are transactional, the user is advised to take backups.

======

19. doc/src/sgml/ref/alter_subscription.sgml

+ <para>
+ There is some interaction between the <literal>only_local</literal>
+ option and <literal>copy_data</literal> option. Refer to the
+ <command>CREATE SUBSCRIPTION</command>
+ <xref linkend="sql-createsubscription-notes" /> for interaction
+ details and usage of <literal>force</literal> for
+ <literal>copy_data</literal> option.
</para>

"option" -> "parameter" (3x)

======

20. doc/src/sgml/ref/create_subscription.sgml

+ <para>
+ There is some interaction between the <literal>only_local</literal>
+ option and <literal>copy_data</literal> option. Refer to the
+ <xref linkend="sql-createsubscription-notes" /> for interaction
+ details and usage of <literal>force</literal> for
+ <literal>copy_data</literal> option.
+ </para>

"option" -> "parameter" (3x)

======

21. src/backend/commands/subscriptioncmds.c

@@ -69,6 +69,18 @@
/* check if the 'val' has 'bits' set */
#define IsSet(val, bits) (((val) & (bits)) == (bits))

+#define IS_COPY_DATA_ON_OR_FORCE(copy_data) ((copy_data) != COPY_DATA_OFF)
+
+/*
+ * Represents whether copy_data option is specified with off, on or force.
+ */
+typedef enum CopyData
+{
+ COPY_DATA_OFF,
+ COPY_DATA_ON,
+ COPY_DATA_FORCE
+} CopyData;
+

21a.
"option" -> "parameter"

21b.
I think the new #define is redundant and can be removed (see the next
review comment)

21c.
I think you set the OFF enum value like:
typedef enum CopyData
{
COPY_DATA_OFF = 0,
COPY_DATA_ON,
COPY_DATA_FORCE
} CopyData;

then it will greatly simplify things; many changes in this file will
be unnecessary (see subsequent review comments)

~~~

22. src/backend/commands/subscriptioncmds.c

+/*
+ * Validate the value specified for copy_data option.
+ */
+static CopyData
+DefGetCopyData(DefElem *def)

"option" -> "parameter"

~~~

23. src/backend/commands/subscriptioncmds.c

+ /*
+ * Allow 0, 1, "true", "false", "on", "off" or "force".
+ */
+ switch (nodeTag(def->arg))
+ {
+ case T_Integer:
+ switch (intVal(def->arg))
+ {
+ case 0:
+ return COPY_DATA_OFF;
+ case 1:
+ return COPY_DATA_ON;
+ default:
+ /* otherwise, error out below */
+ break;
+ }
+ break;

What about also allowing copy_data = 2, and making it equivalent to "force"?

~~~

24. src/backend/commands/subscriptioncmds.c

@@ -333,17 +400,17 @@ parse_subscription_options(ParseState *pstate,
List *stmt_options,
errmsg("%s and %s are mutually exclusive options",
"connect = false", "create_slot = true")));

- if (opts->copy_data &&
+ if (IS_COPY_DATA_ON_OR_FORCE(opts->copy_data) &&
IsSet(opts->specified_opts, SUBOPT_COPY_DATA))

I think this change would not be needed if you set the OFF enum = 0.

~~~
25. src/backend/commands/subscriptioncmds.c

@@ -671,13 +738,14 @@ CreateSubscription(ParseState *pstate,
CreateSubscriptionStmt *stmt,
* Set sync state based on if we were asked to do data copy or
* not.
*/
- table_state = opts.copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY;
+ table_state = IS_COPY_DATA_ON_OR_FORCE(opts.copy_data) ?
SUBREL_STATE_INIT : SUBREL_STATE_READY;

I think this change would not be needed if you set the OFF enum = 0.

~~~

26. src/backend/commands/subscriptioncmds.c

@@ -720,7 +788,8 @@ CreateSubscription(ParseState *pstate,
CreateSubscriptionStmt *stmt,
* PENDING, to allow ALTER SUBSCRIPTION ... REFRESH
* PUBLICATION to work.
*/
- if (opts.twophase && !opts.copy_data && tables != NIL)
+ if (opts.twophase && opts.copy_data == COPY_DATA_OFF &&
+ tables != NIL)
twophase_enabled = true;

I think this change would not be needed if you set the OFF enum = 0.

~~~

27. src/backend/commands/subscriptioncmds.c

@@ -851,7 +921,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data,
list_length(subrel_states), sizeof(Oid), oid_cmp))
{
AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ IS_COPY_DATA_ON_OR_FORCE(copy_data) ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
InvalidXLogRecPtr);

I think this change would not be needed if you set the OFF enum = 0.

~~~
28. src/backend/commands/subscriptioncmds.c

@@ -1157,7 +1227,7 @@ AlterSubscription(ParseState *pstate,
AlterSubscriptionStmt *stmt,
* See ALTER_SUBSCRIPTION_REFRESH for details why this is
* not allowed.
*/
- if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
+ if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed
when two_phase is enabled"),

I think this change would not be needed if you set the OFF enum = 0.

~~~

29. src/backend/commands/subscriptioncmds.c

@@ -1209,7 +1279,7 @@ AlterSubscription(ParseState *pstate,
AlterSubscriptionStmt *stmt,
* See ALTER_SUBSCRIPTION_REFRESH for details why this is
* not allowed.
*/
- if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
+ if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed
when two_phase is enabled"),

I think this change would not be needed if you set the OFF enum = 0.

~~~

30. src/backend/commands/subscriptioncmds.c

@@ -1255,7 +1325,8 @@ AlterSubscription(ParseState *pstate,
AlterSubscriptionStmt *stmt,
*
* For more details see comments atop worker.c.
*/
- if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
+ if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
+ IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("ALTER SUBSCRIPTION ... REFRESH with copy_data is not allowed
when two_phase is enabled"),

I think this change would not be needed if you set the OFF enum = 0.

~~~

31. src/backend/commands/subscriptioncmds.c

+ appendStringInfoString(&cmd,
+ "SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename,
PS.srrelid as replicated\n"
+ "FROM pg_publication P,\n"
+ "LATERAL pg_get_publication_tables(P.pubname) GPT\n"
+ "LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+ "pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
+ "WHERE C.oid = GPT.relid AND P.pubname in (");

use upper case "in" -> "IN"

======

32. src/test/regress/sql/subscription.sql

@@ -40,6 +40,9 @@ SET SESSION AUTHORIZATION 'regress_subscription_user';

-- fail - invalid option combinations
CREATE SUBSCRIPTION regress_testsub2 CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, copy_data = true);
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, copy_data = on);
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, copy_data = 1);
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, copy_data = force);

32a.
The test for "copy_data = 1" should confirm it is the same as "on".

32b.
Should also test copy_data = 2 (and confirm it is the same as "force").

======

33. src/test/subscription/t/032_localonly.pl - cosmetic changes

33a.
+# Detach node C from the node-group of (A, B, C) and clean the table contents
+# from all nodes.

SUGGESTION
Detach node_C from the node-group of (node_A, node_B, node_C) and ...

33b.
+# Subroutine to create subscription and wait till the initial sync is
completed.

"till" -> "until"

33c.
+# Subroutine to create subscription and wait till the initial sync is
completed.
+# Subroutine expects subscriber node, publisher node, subscription name,
+# destination connection string, publication name and the subscription with
+# options to be passed as input parameters.
+sub create_subscription
+{
+ my ($node_subscriber, $node_publisher, $sub_name, $node_connstr,
+ $pub_name, $with_options)
+ = @_;

"subscription with options" => "subscription parameters"
"$with_options" -> "$sub_params"

33d.
+# Specifying only_local 'on' which indicates that the publisher should only

"Specifying" => "Specify"

------
[1] https://www.postgresql.org/docs/15/sql-createsubscription.html
[2] https://www.postgresql.org/docs/current/bgworker.html

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2022-05-26 08:47:13 Re: [RFC] building postgres with meson
Previous Message Masahiko Sawada 2022-05-26 08:35:50 Re: Support logical replication of DDLs