Re: Handle infinite recursion in logical replication setup

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-09-05 04:17:05
Message-ID: CAHut+PueiXUeb4bfdQXFStpxp6KLMEDXGRN_ta-N2RYBQAhm6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are my review comments for v45-0001:

======

1. doc/src/sgml/logical-replication.sgml

<para>
To find which tables might potentially include non-local origins (due to
other subscriptions created on the publisher) try this SQL query:
<programlisting>
SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename
FROM pg_publication P,
LATERAL pg_get_publication_tables(P.pubname) GPT
LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND
P.pubname IN (pub-names);
</programlisting></para>

1a.
To use "<pub-names>" with the <> then simply put meta characters in the SGML.
e.g.
&lt;pub-names&gt;

~

1b.
The patch forgot to add the SQL "#" instruction as suggested by my
previous comment (see [1] #3b)

~~~

2.

<sect1 id="preventing-inconsistencies-for-copy_data-origin">
<title>Preventing Data Inconsistencies for copy_data, origin=NONE</title>

The title is OK, but I think this should all be a <sect2> sub-section
of "31.2 Subscription"

======

3. src/backend/commands/subscriptioncmds.c - check_publications_origin

+ initStringInfo(&cmd);
+ appendStringInfoString(&cmd,
+ "SELECT DISTINCT P.pubname AS pubname\n"
+ "FROM pg_publication P,\n"
+ " LATERAL pg_get_publication_tables(P.pubname) GPT\n"
+ " LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+ " pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
+ "WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND P.pubname IN (");
+ get_publications_str(publications, &cmd, true);
+ appendStringInfoChar(&cmd, ')');
+ get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);

(see from get_skip_tables_str)

+ appendStringInfoString(dest, " AND C.oid NOT IN (SELECT C.oid FROM
pg_class C JOIN pg_namespace N on N.oid = C.relnamespace where ");

IMO the way you are using get_skip_tables_str should be modified. I
will show by way of example below.
- "where" -> "WHERE"
- put the SELECT at the caller instead of inside the function
- handle the ")" at the caller

All this will also make the body of the 'get_skip_tables_str' function
much simpler (see the next review comments)

SUGGESTION
if (subrel_count > 0)
{
/* TODO - put some explanatory comment here about skipping the tables */
appendStringInfo(&cmd,
" AND C.oid NOT IN (\n"
"SELECT C.oid FROM pg_class C\n"
"JOIN pg_namespace N on N.oid = C.relnamespace WHERE ");
get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);
appendStringInf(&cmd, “)”);
}

~~~

4. src/backend/commands/subscriptioncmds.c - get_skip_tables_str

+/*
+ * Add the table names that should be skipped.
+ */

This comment does not have enough detail to know really what the
function is for. Perhaps you only need to say that this is a helper
function for 'check_publications_origin' and then where it is called
you can give a comment (e.g. see my review comment #3)

~~

5. get_skip_tables_str (body)

5a. Variable 'count' is not a very good name; IMO just say 'i' for
index, and it can be declared C99 style.

~

5b. Variable 'first' is not necessary - it's same as (i == 0)

~

5c. The whole "SELECT" part and the ")" parts are more simply done at
the caller (see the review comment #3)

~

5d.

+ appendStringInfo(dest, "(C.relname = '%s' and N.nspname = '%s')",
+ tablename, schemaname);

It makes no difference but I thought it would feel more natural if the
SQL would compare the schema name *before* the table name.

~

5e.
"and" -> "AND"

~

Doing all 5a,b,c,d, and e means overall having a much simpler function body:

SUGGESTION
+ for (int i = 0; i < subrel_count; i++)
+ {
+ Oid relid = subrel_local_oids[i];
+ char *schemaname = get_namespace_name(get_rel_namespace(relid));
+ char *tablename = get_rel_name(relid);
+
+ if (i > 0)
+ appendStringInfoString(dest, " OR ");
+
+ appendStringInfo(dest, "(N.nspname = '%s' AND C.relname = '%s')",
+ schemaname, tablename);
+ }

------
[1] https://www.postgresql.org/message-id/CAHut%2BPsku25%2BVjz7HiohWxc2WU07O_ZV4voFG%2BU7WzwKhUzpGQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-09-05 04:32:37 Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
Previous Message shiy.fnst@fujitsu.com 2022-09-05 03:42:40 RE: Column Filtering in Logical Replication