Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.
Date: 2025-02-10 15:11:07
Message-ID: CAHv8RjKJ2kbmAM6b2M_fVD4OiO2hPK=OiMjG3CUUjNo6K8D0VA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 10, 2025 at 7:05 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Some review comments for v4-0001.
>
> ======
> Commit message
>
> 1.
> This patch enhances the 'pg_createsubscriber' utility by adding the
> '--all-databases' option, which automatically fetches all non-template
> databases on the source server (publisher) and creates corresponding
> subscriptions on the target server (subscriber). This simplifies the process
> of converting a physical standby to a logical subscriber, particularly
> during upgrades.
>
> When '--all-databases' is used, the tool queries the source server for all
> databases and attempts to create subscriptions on the target server for
> databases with matching names. The tool auto-generates subscription,publication
> and replication slot names using the format:
> - Subscription: '<database>_sub'
> - Publication: '<database>_pub'
> - Replication slot: '<database>_slot'
>
> ~
>
> There seems a lot of duplication in the above 2 paragraphs.
> Duplication can be removed.
> BTW, those generated formats are suspicious -- see a later comment below.
>
> SUGGESTION:
> This patch enhances the 'pg_createsubscriber' utility by adding the
> '--all-databases' option. When '--all-databases' is specified, the tool
> queries the source server (publisher) for all databases and creates
> subscriptions on the target server (subscriber) for databases with matching
> names. This simplifies the process of converting a physical standby to a
> logical subscriber, particularly during upgrades.
>
> The tool auto-generates subscription, publication and replication slot
> names using the format:
> ...
>
> ~~~
>
> 2.
> The patch ensures that conflicting options such as '--all-databases' and
> '--database', '--publication', '--subscription', or '--replication-slot' cannot
> be used together
>
> ~
>
> That wording seems misleading because it makes it sound like
> '--publication' and '--database' cannot be used together.
>
> SUGGESTION
> The options '--database', '--publication', '--subscription', and
> '--replication-slot' cannot be used when '--all-databases' is
> specified.
>
> ======
> doc/src/sgml/ref/pg_createsubscriber.sgml
>
> 3.
> What about the synopsis (??)
>
> v4 did not address my previous review comment [1, #5] about the synopsis
>
> > 5.
> > The synopsis has no mention of --all-databases.
> >
> > I'm not sure of the best way to do it -- if it becomes too messy to
> > combine everything then maybe having several is the way to go:
> >
> > e.g.
> >
> > pg_createsubscriber [option...] { -a | --all-databases } { -D |
> > --pgdata }datadir { -P | --publisher-server }connstr
> > pg_createsubscriber [option...] { -d | --database }dbname { -D |
> > --pgdata }datadir { -P | --publisher-server }connstr
> >
>
> ~~~
>
> 4.
> + Automatically fetch all non-template databases from the source server
> + and create subscriptions for databases with the same names on the
> + target server.
> + If a database is present on the source but missing on the target, it is
> + skipped or an error is raised.
> + If a database exists on the target but not on the source, no
> + subscription is created for it.
>
> "it is skipped or an error is raised" (??) So which is it? It cannot be both.
>
> ~~~
>
> 5.
> + Subscription names, publication names, and replication slot names are
> + automatically generated using the format:
> + <literal><replaceable>database</replaceable>_sub</literal>
> + <literal><replaceable>database</replaceable>_pub</literal> and
> + <literal><replaceable>database</replaceable>_slot</literal>
> respectively.
> + </para>
>
> This seems strange to me, because according to the documentation NOTES
> section when --database is used and there is no
> publication/subscription name etc then all the generated name format
> looks like:
>
> ------
> If the --publication option is not specified, the publication has the
> following name pattern: “pg_createsubscriber_%u_%x” (parameter:
> database oid, random int). If the --replication-slot option is not
> specified, the replication slot has the following name pattern:
> “pg_createsubscriber_%u_%x” (parameters: database oid, random int).
> ------
>
> and
>
> ------
> If the --subscription option is not specified, the subscription has
> the following name pattern: “pg_createsubscriber_%u_%x” (parameters:
> database oid, random int).
> ------
>
> So, why are the generated names different for '--all-databases'?
> Actually, I can't see any code even doing what the new documentation
> is saying so is the documentation even telling the truth? The commit
> message will also be impacted.
>
> ~~~
>
> 6.
> + <para>
> + <option>--all-databases</option> cannot be used with
> + <option>--publication</option>, <option>--subscription</option>,
> + or <option>--replication-slot</option>.
> + </para>
>
> What about "--database".
>
> ~~~
>
> 7.
> For all the other incompatible options you wrote:
>
> "Cannot be used together with <option>-a</option>."
>
> IMO it might be nioer to give the full name of the option.
> e.g. "Cannot be used together with <option>--all-databases</option>."
>
> ======
> src/bin/pg_basebackup/pg_createsubscriber.c
>
> usage:
>
> 8.
> + printf(_(" -a, --all-databases create subscriptions for
> all matching databases on the target\n"));
>
> That seems ambiguous to me. How about something more like below to
> clarify the subscription is created on the target.
>
> "for all source databases create subscriptions on the same target database"
>
> ~~~
>
> fetch_source_databases:
>
> 9.
> + const char *query = "SELECT datname FROM pg_database WHERE
> datistemplate = false";
>
> Do we need this variable? It seems simpler/better just to put the SQL in-line.
>
> ~~~
>
> main:
>
> 10.
> switch (c)
> {
> + case 'a':
> + opt.all_databases = true;
> + break;
> +
>
> 10a.
> Missing the check for duplicate option --all-databases. OTOH maybe you
> don't care about this error, because probably it is harmless if you
> just ignore it.
>
> ~
>
> 10b.
> Missing the check for --database, --publication, --subscription,
> --replication-slot.
>
> (e.g. if user specified the --all-databases *after* those other options)
>
> ~~~
>
> 11.
> + if (opt.all_databases)
> + {
> + pg_log_error("--replslot cannot be used with --all-databases");
> + exit(1);
> + }
>
> Where'd this name come from? AFAICT there is no such option as "--replslot"
>
> ~~~
>
> 12.
> - * If --database option is not provided, try to obtain the dbname from
> - * the publisher conninfo. If dbname parameter is not available, error
> - * out.
> + * If neither --database nor --all-databases option is provided, try
> + * to obtain the dbname from the publisher conninfo. If dbname
> + * parameter is not available, error out.
>
> For this comment to make sense I think the previous comment (where
> fetch_source_databases is called) needs to explain that when
> --all-databases is defined then it is going to treat that internally
> as the equivalent of a whole lot of user-specified --database options.
>
> ======
> .../t/040_pg_createsubscriber.pl
>
> 13.
> + qr/cannot specify --publication or --replication-slot when using
> --all-databases/,
> + 'fail if --all-databases is used with publication and slot');
> +
>
> How are tests expecting this even passing?
>
> Searching the patch I cannot find any such error!
>
> ======

Fixed the given comments. The attached patch at [1] contains the
suggested changes.

[1] - https://www.postgresql.org/message-id/CAHv8Rj%2BmhYgh8XLFM8sN8J05z0ia%2BknfB1kP6kjbnB55H-b-mw%40mail.gmail.com

Thanks and regards,
Shubham Khanna.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matheus Alcantara 2025-02-10 15:32:09 Re: explain analyze rows=%.0f
Previous Message Shubham Khanna 2025-02-10 15:06:10 Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.