From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | Re: pg_upgrade and logical replication |
Date: | 2023-11-16 02:14:42 |
Message-ID: | CAHut+PtCXAhxgA3SckBtP-NmFHk4J8PwPqdvPS9wFPu7nu_gmQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Here are some review comments for patch v14-0001
======
src/backend/utils/adt/pg_upgrade_support.c
1. binary_upgrade_replorigin_advance
+ /* lock to prevent the replication origin from vanishing */
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ originid = replorigin_by_name(originname, false);
Use uppercase for the lock comment.
======
src/bin/pg_upgrade/check.c
2. check_for_subscription_state
> > + prep_status("Checking for subscription state");
> > +
> > + snprintf(output_path, sizeof(output_path), "%s/%s",
> > + log_opts.basedir,
> > + "subscription_state.txt");
> >
> > I felt this filename ought to be more like
> > 'subscriptions_with_bad_state.txt' because the current name looks like
> > a normal logfile with nothing to indicate that it is only for the
> > states of the "bad" subscriptions.
>
> I have kept the file name intentionally shorted as we noticed that
> when the upgrade of the publisher patch used a longer name there were
> some buildfarm failures because of longer names.
OK, but how about some other short meaningful name like 'subs_invalid.txt'?
I also thought "state" in the original name was misleading because
this file contains not only subscriptions with bad 'state' but also
subscriptions with missing 'origin'.
~~~
3. check_new_cluster_logical_replication_slots
int nslots_on_old;
int nslots_on_new;
+ int nsubs_on_old = old_cluster.subscription_count;
I felt it might be better to make both these quantities 'unsigned' to
make it more obvious that there are no special meanings for negative
numbers.
~~~
4. check_new_cluster_logical_replication_slots
nslots_on_old = count_old_cluster_logical_slots();
~
IMO the 'nsubs_on_old' should be coded the same as above. AFAICT, this
is the only code where you are interested in the number of
subscribers, and furthermore, it seems you only care about that count
in the *old* cluster. This means the current implementation of
get_subscription_count() seems more generic than it needs to be and
that results in more unnecessary patch code. (I will repeat this same
review comment in the other relevant places).
SUGGESTION
nslots_on_old = count_old_cluster_logical_slots();
nsubs_on_old = count_old_cluster_subscriptions();
~~~
5.
+ /*
+ * Quick return if there are no logical slots and subscriptions to be
+ * migrated.
+ */
+ if (nslots_on_old == 0 && nsubs_on_old == 0)
return;
/and subscriptions/and no subscriptions/
~~~
6.
- if (nslots_on_old > max_replication_slots)
+ if (nslots_on_old && nslots_on_old > max_replication_slots)
pg_fatal("max_replication_slots (%d) must be greater than or equal
to the number of "
"logical replication slots (%d) on the old cluster",
max_replication_slots, nslots_on_old);
Neither nslots_on_old nor max_replication_slots can be < 0, so I don't
see why the additional check is needed here.
AFAICT "if (nslots_on_old > max_replication_slots)" acheives the same
thing that you want.
~~~
7.
+ if (nsubs_on_old && nsubs_on_old > max_replication_slots)
+ pg_fatal("max_replication_slots (%d) must be greater than or equal
to the number of "
+ "subscriptions (%d) on the old cluster",
+ max_replication_slots, nsubs_on_old);
Neither nsubs_on_old nor max_replication_slots can be < 0, so I don't
see why the additional check is needed here.
AFAICT "if (nsubs_on_old > max_replication_slots)" achieves the same
thing that you want.
======
src/bin/pg_upgrade/info.c
8. get_db_rel_and_slot_infos
+ if (cluster == &old_cluster)
+ get_subscription_count(cluster);
+
I felt this is unnecessary because you only want to know the
nsubs_on_old in one place and then only for the old cluster, so
calling this to set a generic attribute for the cluster is overkill.
~~~
9.
+/*
+ * Get the number of subscriptions in the old cluster.
+ */
+static void
+get_subscription_count(ClusterInfo *cluster)
+{
+ PGconn *conn;
+ PGresult *res;
+
+ if (GET_MAJOR_VERSION(cluster->major_version) < 1700)
+ return;
+
+ conn = connectToServer(cluster, "template1");
+ res = executeQueryOrDie(conn,
+ "SELECT oid FROM pg_catalog.pg_subscription");
+
+ cluster->subscription_count = PQntuples(res);
+
+ PQclear(res);
+ PQfinish(conn);
+}
9a.
Currently, this is needed only for the old_cluster (like the function
comment implies), so the parameter is not required.
Also, AFAICT this number is only needed in one place
(check_new_cluster_logical_replication_slots) so IMO it would be
better to make lots of changes to simplify this code:
- change the function name to be like the other one. e.g.
count_old_cluster_subscriptions()
- function to return unsigned
SUGGESTION (something like this...)
unsigned
count_old_cluster_subscriptions(void)
{
unsigned nsubs = 0;
if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
{
PGconn *conn = connectToServer(&old_cluster, "template1");
PGresult *res = executeQueryOrDie(conn,
"SELECT oid FROM pg_catalog.pg_subscription");
nsubs = PQntuples(res);
PQclear(res);
PQfinish(conn);
}
return nsubs;
}
~
9b.
This function is returning 0 (aka not assigning
cluster->subscription_count) for clusters before PG17. IIUC this is
effectively the same behaviour as count_old_cluster_logical_slots()
but probably it needs to be mentioned more in this function comment
why it is like this.
======
src/bin/pg_upgrade/pg_upgrade.h
10.
const char *tablespace_suffix; /* directory specification */
+ int subscription_count; /* number of subscriptions */
} ClusterInfo;
I felt this is not needed because you only need to know the
nsubs_on_old in one place, so you can just call the counting function
from there. Making this a generic attribute for the cluster seems
overkill.
======
src/bin/pg_upgrade/t/004_subscription.pl
11. TEST: Check that pg_upgrade is successful when the table is in init state.
+$synced_query =
+ "SELECT count(1) = 1 FROM pg_subscription_rel WHERE srsubstate = 'i'";
+$old_sub1->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
But it doesn't get to "synchronize data", so should that message say
more like "Timed out while waiting for the table to reach INIT state"
~
12.
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync', '-d', $old_sub1->data_dir,
+ '-D', $new_sub1->data_dir, '-b', $bindir,
+ '-B', $bindir, '-s', $new_sub1->host,
+ '-p', $old_sub1->port, '-P', $new_sub1->port,
+ $mode,
+ ],
+ 'run of pg_upgrade --check for old instance when the subscription
tables are in ready state'
+);
Should that message say "init state" instead of "ready state"?
~~~
13. TEST: when the subscription's replication origin does not exist.
+$old_sub2->safe_psql('postgres',
+ "ALTER SUBSCRIPTION regress_sub2 disable");
/disable/DISABLE/
~~~
14.
+my $subid = $old_sub2->safe_psql('postgres',
+ "SELECT oid FROM pg_subscription WHERE subname = 'regress_sub2'");
+my $reporigin = 'pg_'.qq($subid);
+$old_sub2->safe_psql('postgres',
+ "SELECT pg_replication_origin_drop('$reporigin')"
+);
Maybe this part needs a comment to say the reason why the origin does
not exist -- it's because you found and explicitly dropped it.
======
Kind Regards,
Peter Smith.
Fujitsu Australia