Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
Date: 2025-03-18 06:16:41
Message-ID: CAHv8RjK8q+mWPWPh9K7CeH2tKF31vGn+oPV3qY7pdPCmtbjzkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 18, 2025 at 4:31 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> On Mon, Mar 17, 2025, at 9:34 AM, Shubham Khanna wrote:
>
> I have incorporated the "--remove/-r" parameter in the attached patch,
> as it seems more intuitive and straightforward for users.
> The attached patch contains the latest changes.
>
>
> There were a lot of discussion around the single vs multiple options since my
> last review [1] so I'm answering some of the questions here.
>
> Due to the nature of removing multiple objects, I would say that a general
> option that has a value and can be informed multiple times is the way to go. I
> saw that the discussion led to this. Regarding the name, my preference is
> --drop since we already have other binaries with similar options (pg_receivewal
> and pg_recvlogical have --drop-slot).
>
> The commit message needs some adjustments. There are redundant information (1st
> and last paragraph).
>

Fixed.

> + <literal>publications</literal>. This option is useful when
> + transitioning from streaming replication to logical replication as
> + existing objects may no longer be needed. Before using this option,
>
> Use physical replication. That's what we used in the current
> pg_createsubscriber documentation and it is also the terminology referred in
> the glossary (see Replication).
>

Fixed.

> bool two_phase; /* enable-two-phase option */
> + uint32 remove_objects; /* flag to remove objects on subscriber */
>
> uint32 -> bits32.
>

Fixed.

> -static void setup_subscriber(struct LogicalRepInfo *dbinfo,
> - const char *consistent_lsn);
> +static void setup_subscriber(const char *consistent_lsn);
>
> This is unrelated to this patch.
>

Fixed.

> - * replication setup.
> + * replication setup. If 'remove' parameter is specified, existing publications
> + * on the subscriber will be dropped before creating new subscriptions.
> */
> static void
> -setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn)
> +setup_subscriber(const char *consistent_lsn)
>
> There is no parameter 'remove' in this function. I understand that you want to
> provide information about the remove option but it is not the right place to
> add it. If for some reason you need to change or remove such parameter, it is
> easier to left this comment because it is not near the place you are removing
> some lines of code.
>

Fixed.

> + struct LogicalRepInfo *dbinfo = &dbinfos.dbinfo[i];
>
> /* Connect to subscriber. */
> - conn = connect_database(dbinfo[i].subconninfo, true);
> + conn = connect_database(dbinfo->subconninfo, true);
>
> This new dbinfo pointer is just a way to increase your patch size without
> improving readability.
>

Fixed.

> - drop_publication(conn, &dbinfo[i]);
> + if (dbinfos.remove_objects & OBJECT_PUBLICATIONS)
> + drop_all_publications(conn, dbinfo);
> + else
> + drop_publication(conn, dbinfo, dbinfo->pubname);
>
> At first glance, I didn't like this change. You inform dbinfo->pubname as a 3rd
> parameter but the 2nd parameter is dbinfo. After reading
> drop_all_publication(), I realized that's the cause for this change. Is there a
> better way to do it?
>

I see your point. The use of dbinfo->pubname as the third parameter
while passing dbinfo as the second parameter does feel a bit
inconsistent. However, since drop_all_publications() relies on dbinfo
for context, this approach seemed necessary to keep the interface
consistent with drop_publication().
Currently, I do not have a better alternative that maintains clarity
and consistency, but I am open to suggestions if anyone has ideas to
improve this.

> +static void
> +drop_all_publications(PGconn *conn, struct LogicalRepInfo *dbinfo)
> +{
> + PGresult *res;
> +
>
> I would add an Assert(conn != NULL) here to follow the same pattern as the
> other functions.
>

Fixed.

> + res = PQexec(conn, "SELECT pubname FROM pg_catalog.pg_publication;");
> + if (PQresultStatus(res) != PGRES_TUPLES_OK)
> + {
> + pg_log_error("could not obtain publication information: %s",
> + PQresultErrorMessage(res));
> + PQclear(res);
> + return;
> + }
>
> Shouldn't it disconnect_database() here? See the pattern in other functions
> that error out while querying the catalog.
>

Fixed.

> + SimpleStringListCell *cell;
> +
> + for (cell = opt.remove_objects.head; cell; cell = cell->next)
> + {
>
> You could use SimpleStringListCell in the for loop without a separate declaration.
>

Fixed.

> + pg_log_error("wrong remove object is specified");
> + pg_log_error_hint("Only \"publications\" can be removed.");
> + exit(1);
>
> The main message sounds strange. Did you mean 'wrong object is specified'?
>

Fixed.

> +ok($node_s->safe_psql($db1, "SELECT COUNT(*) = 2 FROM pg_publication"),
> + 'two publications created before --remove is run');
> +
>
> two pre-existing publications on subscriber ?
>

Fixed.

> +is($node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"),
> + '0', 'all publications dropped after --remove is run');
> +
>
> all publications on subscriber have been removed ?
>

Fixed.

> + primary_conninfo = '$pconnstr'
> + max_logical_replication_workers = 5
>
> Do you need to set m_l_r_w here?
>
> +# --verbose is used twice to show more information.
>
> This comment is superfluous. Remove it.
>
> +# Confirm publications still remain after running 'pg_createsubscriber'
> +is($node_u->safe_psql($db3, "SELECT COUNT(*) FROM pg_publication;"),
> + '2', 'all publications remain after running pg_createsubscriber');
>
> I would remove the semicolon here because none of the SQL commands in this test
> file includes it.
>
> Files=1, Tests=43, 14 wallclock secs ( 0.04 usr 0.01 sys + 1.65 cusr 2.06 csys = 3.76 CPU)
> Result: PASS
>
> Do we really need this extra test? It increases the test duration by 4 seconds.
> Again, that's still unacceptable for such an optional feature. Maybe you should
> experiment my suggestion in the previous review. You don't need a real run to
> prove that pg_createsubscriber is not removing unintended objects.
>
>
> [1] http://postgr.es/m/6f266ce2-38d4-433f-afc9-b47c48c17509@app.fastmail.com
>
>

I have removed the additional test case and the new node (node_u) that
was added for it. This should help reduce the overall test duration
without compromising the coverage of the new feature.

> --

The attached patch contains the suggested changes,

Thanks and regards,
Shubham Khanna.

Attachment Content-Type Size
v18-0001-Support-for-dropping-all-publications-in-pg_crea.patch application/octet-stream 13.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shubham Khanna 2025-03-18 06:20:29 Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
Previous Message Ashutosh Bapat 2025-03-18 06:15:47 Re: Fix couple of typos