Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
Date: 2025-02-17 04:18:52
Message-ID: CAHut+Pt9AidicguCX8b0etzP8_hy0RmAfgLR2wKC3cxeuz5E=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Shubham

Some review comments for v7-0001.

(I am late to this thread. If some of my comments have already been
discussed and rejected please let me know).

======
1 GENERAL. Option Name?

Wondering why the patch is introducing more terminology like
"cleanup"; if we are dropping publications then why not say "drop"?
Also I am not sure if "existing" means anything because you cannot
cleanup/drop something that is not "existing".

IOW, why not call this the "--drop-publications" option?

======
Commit message

2.
These publications, replicated during streaming replication, become redundant
after converting to logical replication and serve no further purpose.

~

From this description it seems there is an assumption that the only
publications on the target server are those that were physically
replicated to the standby. Is that strictly true? Isn't it also
possible that a user might have created their own publication on the
target server prior to running the pg_createsubscriber. So even if
they want all the physically replicated ones to be removed, they would
NOT want their own new publication to also get removed at the same
time.

E.g. The original motivation thread [1] for this patch only said "But
what good is having the SAME publications as primary also on logical
replica?" (my emphasis of "same") so it seems there should be some
sort of name matching before just dropping everything.

Actually.... The code looks like it might be doing the correct thing
already and only fetching the publication names from the source server
and then deleting only those names from the target server. But this
comment message didn't describe this clearly.

======
doc/src/sgml/ref/pg_createsubscriber.sgml

3.
+ <para>
+ Remove all existing publications from specified databases on tne target
+ server.
+ </para>

typo "tne"

======
src/bin/pg_basebackup/pg_createsubscriber.c

struct CreateSubscriberOptions:

4.
+ bool cleanup_existing_publications; /* drop all publications */

The field name seems overkill. e.g. As mentioned for the option, it
could be called 'drop' instead of cleanup. And the 'existing' seems
superfluous because you can only drop something that exists. So why
not just 'drop_publications'. Won't that have the same meaning?

~~~

5.
static void
-setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn)
+setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn,
+ bool cleanup_existing_publications)

The setup_subscriber's function comment does not say anything about
this function potentially also dropping publications at the
subscriber.

~~~

6.
static void
-drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
+drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo,
+ bool cleanup_existing_publications)

6a.
This arrangement seems complicated to me.

IMO it would be more natural to have 2 separate functions, and just
call the appropriate one.
drop_publication()
drop_all_publications()

instead of trying to make this function do everything.

~

6b.
Furthermore, can't you just have a common helper function to DROP a
single PUBLICATION by name?

Then the code that drops all publications can just loop to call this
common dropper for each iteration. Code should be much simpler. I
don't see the efficiency of this operation is really a factor,
pg_createsubscriber is rarely used, so IMO a better goal is code
simplicity/maintenance.

e.g. drop_publication() --> _drop_one_publication()
e.g drop_all_publications() --> LOOP (pub list) { _drop_one_publication() }

======
.../t/040_pg_createsubscriber.pl

7.
+
+# Create publications to test it's removal
+$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;");
+$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;");
+

/it's/their/

~~~

8.
Maybe also do a CREATE PUBLICATION at node_s, prior to the
pg_createsubvscript, so then you can verify that the user-created one
is unaffected by the cleanup of all the others.

======
[1] https://www.postgresql.org/message-id/CAExHW5t4ew7ZrgcDdTv7YmuG7LVQT1ZaEny_EvtngHtEBNyjcQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2025-02-17 04:55:09 Some read stream improvements
Previous Message Amul Sul 2025-02-17 04:05:56 Re: NOT ENFORCED constraint feature