Re: pg_upgrade and logical replication

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: pg_upgrade and logical replication
Date: 2023-11-09 06:53:26
Message-ID: ZUyB5oa-jfF2xrZr@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 09, 2023 at 01:14:05PM +1100, Peter Smith wrote:
> Looks like v12 accidentally forgot to update this to the modified
> function name 'binary_upgrade_add_sub_rel_state'

This v12 is overall cleaner than its predecessors. Nice to see.

+my $result = $publisher->safe_psql('postgres', "SELECT count(*) FROM t1");
+is($result, qq(1), "check initial t1 table data on publisher");
+$result = $publisher->safe_psql('postgres', "SELECT count(*) FROM t2");
+is($result, qq(1), "check initial t1 table data on publisher");
+$result = $old_sub->safe_psql('postgres', "SELECT count(*) FROM t1");
+is($result, qq(1), "check initial t1 table data on the old subscriber");
+$result = $old_sub->safe_psql('postgres', "SELECT count(*) FROM t2");

I'd argue that t1 and t2 should have less generic names. t1 is used
to check that the upgrade process works, while t2 is added to the
publication after upgrading the subscriber. Say something like
tab_upgraded or tab_not_upgraded?

+my $synced_query =
+ "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r');";

Perhaps it would be safer to use a query that checks the number of
relations in 'r' state? This query would return true if
pg_subscription_rel has no tuples.

+# Table will be in 'd' (data is being copied) state as table sync will fail
+# because of primary key constraint error.
+my $started_query =
+ "SELECT count(1) = 1 FROM pg_subscription_rel WHERE srsubstate = 'd';";

Relying on a pkey error to enforce an incorrect state is a good trick.
Nice.

+command_fails(
+ [
+ 'pg_upgrade', '--no-sync', '-d', $old_sub->data_dir,
+ '-D', $new_sub->data_dir, '-b', $bindir,
+ '-B', $bindir, '-s', $new_sub->host,
+ '-p', $old_sub->port, '-P', $new_sub->port,
+ $mode, '--check',
+ ],
+ 'run of pg_upgrade --check for old instance with relation in \'d\' datasync(invalid) state');
+rmtree($new_sub->data_dir . "/pg_upgrade_output.d");

Okay by me to not stop the cluster for the --check to shave a few
cycles. It's a bit sad that we don't cross-check the contents of
subscription_state.txt before removing pg_upgrade_output.d. Finding
the file is easy even if the subdir where it is included is not a
constant name. Then it is possible to apply a regexp with the
contents consumed by a slurp_file().

+my $remote_lsn = $old_sub->safe_psql('postgres',
+ "SELECT remote_lsn FROM pg_replication_origin_status");
Perhaps you've not noticed, but this would be 0/0 most of the time.
However the intention is to check after a valid LSN to make sure that
the origin is set, no?

I am wondering whether this should use a bit more data than just one
tuple, say at least two transaction, one of them with a multi-value
INSERT?

+# ------------------------------------------------------
+# Check that pg_upgrade is successful when all tables are in ready state.
+# ------------------------------------------------------
This comment is a bit inconsistent with the state that are accepted,
but why not, at least that's predictible.

+ * relation to pg_subscripion_rel table. This will be used only in

Typo: s/pg_subscripion_rel/pg_subscription_rel/.

This needs some word-smithing to explain the reasons why a state is
not needed:

+ /*
+ * The subscription relation should be in either i (initialize),
+ * r (ready) or s (synchronized) state as either the replication slot
+ * is not created or the replication slot is already dropped and the
+ * required WAL files will be present in the publisher. The other
+ * states are not ok as the worker has dependency on the replication
+ * slot/origin in these case:

A slot not created yet refers to the 'i' state, while 'r' and 's'
refer to a slot created previously but already dropped, right?
Shouldn't this comment tell that rather than mixing the assumptions?

+ * a) SUBREL_STATE_DATASYNC: In this case, the table sync worker will
+ * try to drop the replication slot but as the replication slots will
+ * be created with old subscription id in the publisher and the
+ * upgraded subscriber will not be able to clean the slots in this
+ * case.

Proposal: A relation upgraded while in this state would retain a
replication slot, which could not be dropped by the sync worker
spawned after the upgrade because the subscription ID tracked by the
publisher does not match anymore.

Note: actually, this would be OK if we are able to keep the OIDs of
the subscribers consistent across upgrades? I'm OK to not do nothing
about that in this patch, to keep it simpler. Just asking in passing.

+ * b) SUBREL_STATE_FINISHEDCOPY: In this case, the tablesync worker will
+ * expect the origin to be already existing as the origin is created
+ * with an old subscription id, tablesync worker will not be able to
+ * find the origin in this case.

Proposal: A tablesync worker spawned to work on a relation upgraded
while in this state would expect an origin ID with the OID of the
subscription used before the upgrade, causing it to fail.

+ "A list of problem subscriptions is in the file:\n"

Sounds a bit strange, perhaps use an extra "the", as of "the problem
subscriptions"?

Could it be worth mentioning in the docs that one could also DISABLE
the subscriptions before running the upgrade?

+ The replication origin entry corresponding to each of the subscriptions
+ should exist in the old cluster. This can be found by checking
+ <link linkend="catalog-pg-subscription">pg_subscription</link> and
+ <link linkend="catalog-pg-replication-origin">pg_replication_origin</link>
+ system tables.

Hmm. No need to mention pg_replication_origin_status?

If I may ask, how did you check that the given relation states were
OK or not OK? Did you hardcode some wait points in tablesync.c up to
where a state is updated in pg_subscription_rel, then shutdown the
cluster before the upgrade to maintain the catalog in this state?
Finally, after the upgrade, you've cross-checked the dependencies on
the slots and origins to see that the spawned sync workers turned
crazy because of the inconsistencies. Right?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Junwang Zhao 2023-11-09 07:08:11 Re: make pg_ctl more friendly
Previous Message Kyotaro Horiguchi 2023-11-09 06:42:30 Re: A recent message added to pg_upgade