Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
Date: 2025-03-06 03:56:33
Message-ID: CAHut+PtRZbuF_vAva6azC+6WVPyaoqT1fCeLNP0r5gwPEsfcwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Shubham.

Some review comments for patch v13-0001.

======
GENERAL

1.
--cleanup-existing-publications

I've never liked this proposed switch name much.

e.g. why say "cleanup" instead of "drop"? What is the difference?
Saying drop is very explicit about what the switch will do.
e.g. why say "existing"? It seems redundant; you can't cleanup/drop
something that does not exist.

My preference is one of:
--drop-publications
--drop-all-publications

either of which seem nicely aligned with the descriptions in the usage and docs.

Yeah, I know I have raised this same point before, but AFAICT the
reply was like "will revise it in the next patch version", but that
was many versions ago. I think it is important to settle the switch
name earlier than later because there will be many tentacles into the
code (vars, params, fields, comments) and docs if anything changes --
so it is not a decision you want to leave till the end because it
could destabilise everything at the last minute.

======
Commit message

2.
By default, publications are preserved to avoid unintended data loss.

~

Was there supposed to be a blank line before this text, or should this
text be wrapped into the preceding paragraph?

======
src/bin/pg_basebackup/pg_createsubscriber.c

setup_subscriber:

3.
/*
* Create the subscriptions, adjust the initial location for logical
* replication and enable the subscriptions. That's the last step for logical
- * replication setup.
+ * replication setup. If 'drop_publications' options is true, existing
+ * publications on the subscriber will be dropped before creating new
+ * subscriptions.
*/

There are multiple things amiss with this comment.
- 'drop_publications' is not the parameter name
- 'drop_publications' options [sic plural??]. It is not an option
here; it is a parameter

~~~

check_and_drop_existing_publications:

4.
/*
- * Remove publication if it couldn't finish all steps.
+ * Check and drop existing publications on the subscriber if requested.
*/

There is no need to say "if requested.". It is akin to saying this
function does XXX if this function is called.

~~~

drop_publication_by_name:

5.
+/* Drop the specified publication of the given database. */
+static void
+drop_publication_by_name(PGconn *conn, const char *pubname, const char *dbname)

5a.
I think it is better to define this function before
check_and_drop_existing_publications. YMMV.

~

5b.
IMO the parameters should be reordered (PGconn *conn, const char
*dbname, const char *pubname). It seems more natural and would be
consistent with check_and_drop_existing_publications. YMMV.

~~~

6.
- dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res));
- dbinfo->made_publication = false; /* don't try again. */
+ pubname, dbname, PQresultErrorMessage(res));
+ dbinfos.dbinfo->made_publication = false; /* don't try again. */

Something about this flag assignment seems odd to me. IIUC
'made_publications' is for removing the cleanup_objects_atexit to be
able to remove the special publication implicitly made by the
pg_createsubscriber. But, AFAIK we can also get to this code via the
--cleanup-existing-publication switch, so actually it might be one of
the "existing" publication DROPS that has failed. If so, then the
"don't try again comment" is misleading because it may not be that
same publication "again" at all.

======
.../t/040_pg_createsubscriber.pl

GENERAL.

7.
Most of the changes to this test code are just changing node name S ->
S1. It's not much to do with the patch other than tidying it in
preparation for a new node S2. OTOH it makes the review far harder
because there are so many changes. IMO all this S->S1 stuff should be
done as a separate pre-requisite patch and then it will be far easier
to see what changes are added for the --clean-existing-publications
testing.

~~~

8.
# Set up node S as standby linking to node P
$node_p->backup('backup_1');
-my $node_s = PostgreSQL::Test::Cluster->new('node_s');
-$node_s->init_from_backup($node_p, 'backup_1', has_streaming => 1);
-$node_s->append_conf(
+my $node_s1 = PostgreSQL::Test::Cluster->new('node_s1');
+$node_s1->init_from_backup($node_p, 'backup_1', has_streaming => 1);
+$node_s1->append_conf(

The comment should refer to S1, not S.

~~~

9.
# Set up node C as standby linking to node S
-$node_s->backup('backup_2');
+$node_s1->backup('backup_2');
my $node_c = PostgreSQL::Test::Cluster->new('node_c');
-$node_c->init_from_backup($node_s, 'backup_2', has_streaming => 1);
+$node_c->init_from_backup($node_s1, 'backup_2', has_streaming => 1);

The comment should refer to S1, not S.

~~~

10.
# Check some unmet conditions on node S
-$node_s->append_conf(
+$node_s1->append_conf(

The comment should refer to S1, not S.

(note... there are lots of these. You should search/fix them all;
these review comments might miss some)

~~~

11.
+ '--socketdir' => $node_s1->host,
+ '--subscriber-port' => $node_s1->port,
'--database' => $db1,
'--database' => $db2,
],
'standby contains unmet conditions on node S');

The message should refer to S1, not S.

(note... there are lots of these. You should search/fix them all;
these review comments might miss some)

~~~

12.
# dry run mode on node S
command_ok(
@@ -338,10 +353,10 @@ command_ok(
'--verbose',
'--dry-run',
'--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
- '--pgdata' => $node_s->data_dir,
+ '--pgdata' => $node_s1->data_dir,
'--publisher-server' => $node_p->connstr($db1),
- '--socketdir' => $node_s->host,
- '--subscriber-port' => $node_s->port,
+ '--socketdir' => $node_s1->host,
+ '--subscriber-port' => $node_s1->port,
'--publication' => 'pub1',
'--publication' => 'pub2',
'--subscription' => 'sub1',
@@ -352,10 +367,11 @@ command_ok(
'run pg_createsubscriber --dry-run on node S');
Comment and Message should refer to S1, not S.

~~~

13.
# Check if node S is still a standby
-$node_s->start;
-is($node_s->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'),
- 't', 'standby is in recovery');
-$node_s->stop;
+$node_s1->start;
+is( $node_s1->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'),
+ 't',
+ 'standby is in recovery');
+$node_s1->stop;

The comment should refer to S1, not S.

~~~

14.
-# Run pg_createsubscriber on node S. --verbose is used twice
-# to show more information.
+# Run pg_createsubscriber on node S using '--cleanup-existing-publications'.
+# --verbose is used twice to show more information.
# In passing, also test the --enable-two-phase option
command_ok(
[
'pg_createsubscriber',
'--verbose', '--verbose',
'--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
- '--pgdata' => $node_s->data_dir,
+ '--pgdata' => $node_s1->data_dir,
'--publisher-server' => $node_p->connstr($db1),
- '--socketdir' => $node_s->host,
- '--subscriber-port' => $node_s->port,
+ '--socketdir' => $node_s1->host,
+ '--subscriber-port' => $node_s1->port,
'--publication' => 'pub1',
'--publication' => 'pub2',
'--replication-slot' => 'replslot1',
'--replication-slot' => 'replslot2',
'--database' => $db1,
'--database' => $db2,
- '--enable-two-phase'
+ '--enable-two-phase',
+ '--cleanup-existing-publications',
],
'run pg_createsubscriber on node S');

Comment and Message should refer to S1, not S.

~~~

15.
+# Create user-defined publications
+$node_p->safe_psql($db3, "CREATE PUBLICATION test_pub3 FOR ALL TABLES;");
+$node_p->safe_psql($db3, "CREATE PUBLICATION test_pub4 FOR ALL TABLES;");

Probably these can be combined.

~~~

16.
+# Drop the newly created publications
+$node_p->safe_psql($db3, "DROP PUBLICATION IF EXISTS test_pub3;");
+$node_p->safe_psql($db3, "DROP PUBLICATION IF EXISTS test_pub4;");
+

Probably these can be combined.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniil Davydov 2025-03-06 03:59:27 Add arbitrary xid and mxid to pg_resetwal
Previous Message Ajin Cherian 2025-03-06 03:47:52 Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.