Re: Handle infinite recursion in logical replication setup

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-07-29 03:01:30
Message-ID: CAHut+Ps5eekrmqBnVKM9e=W5kwQ0DT0cJr=SH4xLe7=M5LDyPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some comments for the patch v40-0001:

======

1. Commit message

It might be better to always use 'copy_data = true' in favour of
'copy_data = on' just for consistency with all the docs and the error
messages.

======

2. doc/src/sgml/ref/create_subscription.sgml

@@ -386,6 +401,15 @@ CREATE SUBSCRIPTION <replaceable
class="parameter">subscription_name</replaceabl
can have non-existent publications.
</para>

+ <para>
+ If the subscription is created with <literal>origin = NONE</literal> and
+ <literal>copy_data = true</literal>, it will check if the publisher has
+ subscribed to the same table from other publishers and, if so, throw an
+ error to prevent possible non-local data from being copied. The user can
+ override this check and continue with the copy operation by specifying
+ <literal>copy_data = force</literal>.
+ </para>

2a.
It is interesting that you changed the note to say origin = NONE.
Personally, I prefer it written as you did, but I think maybe this
change does not belong in this patch. The suggestion for changing from
"none" to NONE is being discussed elsewhere in this thread and
probably all such changes should be done together (if at all) as a
separate patch. Until then I think this patch 0001 should just stay
consistent with whatever is already pushed on HEAD.

2b.
"possible no-local data". Maybe the terminology "local/non-local" is a
hangover from back when the subscription parameter was called local
instead of origin. I'm not sure if you want to change this or not, and
anyway I didn't have any better suggestions – so this comment is just
to bring it to your attention.

======

3. src/backend/commands/subscriptioncmds.c - DefGetCopyData

+ /*
+ * The set of strings accepted here should match up with the
+ * grammar's opt_boolean_or_string production.
+ */
+ if (pg_strcasecmp(sval, "false") == 0 ||
+ pg_strcasecmp(sval, "off") == 0)
+ return COPY_DATA_OFF;
+ if (pg_strcasecmp(sval, "true") == 0 ||
+ pg_strcasecmp(sval, "on") == 0)
+ return COPY_DATA_ON;
+ if (pg_strcasecmp(sval, "force") == 0)
+ return COPY_DATA_FORCE;

I understand the intention of the comment, but it is not strictly
correct to say "should match up" because "force" is a new value.
Perhaps the comment should be as suggested below.

SUGGESTION
The set of strings accepted here must include all those accepted by
the grammar's opt_boolean_or_string production.

~~~

4. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed

@@ -1781,6 +1858,122 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
table_close(rel, RowExclusiveLock);
}

+/*
+ * Check and throw an error if the publisher has subscribed to the same table
+ * from some other publisher. This check is required only if "copy_data = on"
+ * and "origin = NONE" for CREATE SUBSCRIPTION and
+ * ALTER SUBSCRIPTION ... REFRESH statements to avoid the publisher from
+ * replicating data that has an origin.
+ *
+ * This check need not be performed on the tables that are already added as
+ * incremental sync for such tables will happen through WAL and the origin of
+ * the data can be identified from the WAL records.
+ *
+ * subrel_local_oids contains the list of relation oids that are already
+ * present on the subscriber.
+ */
+static void
+check_pub_table_subscribed(WalReceiverConn *wrconn, List *publications,
+ CopyData copydata, char *origin,
+ Oid *subrel_local_oids, int subrel_count)

4a.
"copy_data = on" -> "copy_data = true" (for consistency with the docs
and the error messages)

4b.
The same NONE/none review comment from #2a applies here too. Probably
it should be written as none for now unless/until *everything* changes
to NONE.

4c.
"to avoid the publisher from replicating data that has an origin." ->
"to avoid replicating data that has an origin."

4d.
+ * This check need not be performed on the tables that are already added as
+ * incremental sync for such tables will happen through WAL and the origin of
+ * the data can be identified from the WAL records.

SUGGESTION (maybe?)
This check need not be performed on the tables that are already added
because incremental sync for those tables will happen through WAL and
the origin of the data can be identified from the WAL records.

======

5. src/test/subscription/t/030_origin.pl

+ "Refresh publication when the publisher has subscribed for the new table"

SUGGESTION (Just to mention origin = none somehow. Maybe you can
reword it better than this)
Refresh publication when the publisher has subscribed for the new
table, but the subscriber-side wants origin=none

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2022-07-29 03:23:57 Re: [Refactor]Avoid to handle FORCE_NOT_NULL/FORCE_NULL options when COPY TO
Previous Message Peter Smith 2022-07-29 02:56:12 Re: Functions 'is_publishable_class' and 'is_publishable_relation' should stay together.