Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
Date: 2025-02-20 06:44:27
Message-ID: CAHv8RjLgLCbg-e0_5Ly_-8AojraV1R1g4BuaL-an7+iTLMycLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 17, 2025 at 9:49 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Shubham
>
> Some review comments for v7-0001.
>
> (I am late to this thread. If some of my comments have already been
> discussed and rejected please let me know).
>
> ======
> 1 GENERAL. Option Name?
>
> Wondering why the patch is introducing more terminology like
> "cleanup"; if we are dropping publications then why not say "drop"?
> Also I am not sure if "existing" means anything because you cannot
> cleanup/drop something that is not "existing".
>
> IOW, why not call this the "--drop-publications" option?
>

I have retained the option name as '--cleanup-existing-publications'
for now. However, I understand the concern regarding terminology and
will revise it in the next patch version once there is a consensus on
the final name.

> ======
> Commit message
>
> 2.
> These publications, replicated during streaming replication, become redundant
> after converting to logical replication and serve no further purpose.
>
> ~
>
> From this description it seems there is an assumption that the only
> publications on the target server are those that were physically
> replicated to the standby. Is that strictly true? Isn't it also
> possible that a user might have created their own publication on the
> target server prior to running the pg_createsubscriber. So even if
> they want all the physically replicated ones to be removed, they would
> NOT want their own new publication to also get removed at the same
> time.
>
> E.g. The original motivation thread [1] for this patch only said "But
> what good is having the SAME publications as primary also on logical
> replica?" (my emphasis of "same") so it seems there should be some
> sort of name matching before just dropping everything.
>
> Actually.... The code looks like it might be doing the correct thing
> already and only fetching the publication names from the source server
> and then deleting only those names from the target server. But this
> comment message didn't describe this clearly.
>
> ======

Modified the commit message.

> doc/src/sgml/ref/pg_createsubscriber.sgml
>
> 3.
> + <para>
> + Remove all existing publications from specified databases on tne target
> + server.
> + </para>
>
> typo "tne"
>
> ======

Fixed.

> src/bin/pg_basebackup/pg_createsubscriber.c
>
> struct CreateSubscriberOptions:
>
> 4.
> + bool cleanup_existing_publications; /* drop all publications */
>
> The field name seems overkill. e.g. As mentioned for the option, it
> could be called 'drop' instead of cleanup. And the 'existing' seems
> superfluous because you can only drop something that exists. So why
> not just 'drop_publications'. Won't that have the same meaning?
>
> ~~~
>

Fixed.

> 5.
> static void
> -setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn)
> +setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn,
> + bool cleanup_existing_publications)
>
> The setup_subscriber's function comment does not say anything about
> this function potentially also dropping publications at the
> subscriber.
>
> ~~~
>

Fixed.

> 6.
> static void
> -drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
> +drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo,
> + bool cleanup_existing_publications)
>
> 6a.
> This arrangement seems complicated to me.
>
> IMO it would be more natural to have 2 separate functions, and just
> call the appropriate one.
> drop_publication()
> drop_all_publications()
>
> instead of trying to make this function do everything.
>
> ~
>
> 6b.
> Furthermore, can't you just have a common helper function to DROP a
> single PUBLICATION by name?
>
> Then the code that drops all publications can just loop to call this
> common dropper for each iteration. Code should be much simpler. I
> don't see the efficiency of this operation is really a factor,
> pg_createsubscriber is rarely used, so IMO a better goal is code
> simplicity/maintenance.
>
> e.g. drop_publication() --> _drop_one_publication()
> e.g drop_all_publications() --> LOOP (pub list) { _drop_one_publication() }
>
> ======

Fixed.

> .../t/040_pg_createsubscriber.pl
>
> 7.
> +
> +# Create publications to test it's removal
> +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;");
> +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;");
> +
>
> /it's/their/
>
> ~~~
>

Fixed.

> 8.
> Maybe also do a CREATE PUBLICATION at node_s, prior to the
> pg_createsubvscript, so then you can verify that the user-created one
> is unaffected by the cleanup of all the others.
>
> ======

Since $node_s is a streaming standby, it does not allow object
creation. As a result, publications cannot be created on $node_s.

> [1] https://www.postgresql.org/message-id/CAExHW5t4ew7ZrgcDdTv7YmuG7LVQT1ZaEny_EvtngHtEBNyjcQ%40mail.gmail.com
>

The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.

Attachment Content-Type Size
v8-0001-Support-for-dropping-all-publications-in-pg_creat.patch application/octet-stream 12.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2025-02-20 07:20:27 RE: Conflict detection for update_deleted in logical replication
Previous Message Mahendra Singh Thalor 2025-02-20 06:43:54 Re: Non-text mode for pg_dumpall