Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
Date: 2025-02-04 12:09:05
Message-ID: CANhcyEUkQybOe=J5KypWQ80deULBLjjsdyP+DF57+BwSni57Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 29 Jan 2025 at 15:14, Shubham Khanna
<khannashubham1197(at)gmail(dot)com> wrote:
>
> On Wed, Jan 29, 2025 at 10:42 AM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Dear Shubham,
> >
> > > I propose adding the --clean-publisher-objects option to the
> > > pg_createsubscriber utility. As discussed in [1], this feature ensures
> > > a clean and streamlined setup of logical replication by removing stale
> > > or unnecessary publications from the subscriber node. These
> > > publications, replicated during streaming replication, become
> > > redundant after converting to logical replication and serve no further
> > > purpose. This patch introduces the drop_all_publications() function,
> > > which efficiently fetches and drops all publications on the subscriber
> > > node within a single transaction.
> >
> > I think replication slot are also type of 'publisher-objects', but they are not
> > removed for now: API-name may not be accurate. And...
> >
> > > Additionally, other related objects, such as subscriptions and
> > > replication slots, may also require cleanup. I plan to analyze this
> > > further and include them in subsequent patches.
> >
> > I'm not sure replication slots should be cleaned up. Apart from other items like
> > publication/subscription, replication slots are not replicated when it is created
> > on the primary instance. This means they are intentionally created by DBAs and there
> > may not be no strong reasons to drop them after the conversion.
> >
> > Another question is the style of APIs. Do you plan to provide APIs like
> > 'cleanup-subscriber-objects' and 'cleanup-publisher-objects', or just one
> > 'cleanup-logical-replication-objects'?
> >
>
> Thanks for the suggestions, I will keep them in mind while preparing
> the 0002 patch for the same.
> Currently, I have changed the API to '--cleanup-publisher-objects'.
>
> > Regarding the patch:
> >
> > 1.
> > ```
> > + The <application>pg_createsubscriber</application> now supports the
> > + <option>--clean-publisher-objects</option> to remove all publications on
> > + the subscriber node before creating a new subscription.
> > ```
> >
> > This description is not suitable for the documentation. Something like:
> >
> > Remove all publications on the subscriber node.
> >
> > 2.
> > ```
> > + /* Drop publications from the subscriber if requested */
> > + drop_all_publications(dbinfo);
> > ```
> >
> > This should be called when `opts.clean_publisher_objects` is true.
> >
> > 3.
> > You said publications are dropped within a single transaction, but the
> > patch does not do. Which is correct?
> >
> > 4.
> > ```
> > +# Set up node A as primary
> > +my $node_a = PostgreSQL::Test::Cluster->new('node_a');
> > +my $aconnstr = $node_a->connstr;
> > +$node_a->init(allows_streaming => 'logical');
> > +$node_a->append_conf('postgresql.conf', 'autovacuum = off');
> > +$node_a->start;
> > +
> > +# Set up node B as standby linking to node A
> > +$node_a->backup('backup_3');
> > +my $node_b = PostgreSQL::Test::Cluster->new('node_b');
> > +$node_b->init_from_backup($node_a, 'backup_3', has_streaming => 1);
> > +$node_b->append_conf(
> > + 'postgresql.conf', qq[
> > + primary_conninfo = '$aconnstr'
> > + hot_standby_feedback = on
> > + max_logical_replication_workers = 5
> > + ]);
> > +$node_b->set_standby_mode();
> > +$node_b->start;
> > ```
> >
>
> Fixed the given comments. The attached patch contains the suggested changes.
>

Hi Shubham,

I have reviewed the v2 patch. Here are my comments:

1. Initial of the comment should in smallcase to make it similar to
other comments:
+ bool cleanup_publisher_objects; /* Drop all publications */

2. We should provide a comment for the function.
+static void
+drop_all_publications(const struct LogicalRepInfo *dbinfo)
+{

3. We should declare it as "const char*"
+ char *search_query = "SELECT pubname FROM pg_catalog.pg_publication;";
+

4. Instead of warning we should throw an error here:
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_warning("could not obtain publication information: %s",
+ PQresultErrorMessage(res));
+

5. Should we throw an error in this case as well?
+ if (PQresultStatus(res_for_drop) != PGRES_COMMAND_OK)
+ {
+ pg_log_warning("could not drop publication \"%s\": %s",
+ pubname, PQresultErrorMessage(res));
+ }

6. We should start the standby server before creating the
publications, so that the publications are replicated to standby.
+# Create publications to test it's removal
+$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;");
+$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;");
+
+# Verify the existing publications
+my $pub_count_before =
+ $node_p->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;");

Also maybe we should check the publication count on the standby node
instead of primary.

Thanks and Regards,
Shlok Kyal

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2025-02-04 12:16:28 RE: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Pavel Borisov 2025-02-04 11:54:52 Re: Using Expanded Objects other than Arrays from plpgsql