Re: pg_upgrade and logical replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_upgrade and logical replication
Date: 2023-12-01 05:26:43
Message-ID: CAHut+Pst-wJ-2nw90gAof+TbbfQC1ERsVeWvG_HaNH7xqu4Fdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are review comments for patch v21-0001

======
src/bin/pg_upgrade/check.c

1. check_old_cluster_subscription_state

+/*
+ * check_old_cluster_subscription_state()
+ *
+ * Verify that each of the subscriptions has all their corresponding tables in
+ * i (initialize) or r (ready).
+ */
+static void
+check_old_cluster_subscription_state(void)

Function comment should also mention it also validates the origin.

~~~

2.
In this function there are a couple of errors written to the
"subs_invalid.txt" file:

+ fprintf(script, "replication origin is missing for database:\"%s\"
subscription:\"%s\"\n",
+ PQgetvalue(res, i, 0),
+ PQgetvalue(res, i, 1));

and

+ fprintf(script, "database:\"%s\" subscription:\"%s\" schema:\"%s\"
relation:\"%s\" state:\"%s\" not in required state\n",
+ active_db->db_name,
+ PQgetvalue(res, i, 0),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 2),
+ PQgetvalue(res, i, 3));

The format of those messages is not consistent. It could be improved
in a number of ways to make them more similar. e.g. below.

SUGGESTION #1
the replication origin is missing for database:\"%s\" subscription:\"%s\"\n
the table sync state \"%s\" is not allowed for database:\"%s\"
subscription:\"%s\" schema:\"%s\" relation:\"%s\"\n

SUGGESTION #2
database:\"%s\" subscription:\"%s\" -- replication origin is missing\n
database:\"%s\" subscription:\"%s\" schema:\"%s\" relation:\"%s\" --
upgrade when table sync state is \"%s\" is not supported\n

etc.

======
src/bin/pg_upgrade/t/004_subscription.pl

3.
+# Initial setup
+$publisher->safe_psql('postgres', "CREATE TABLE tab_upgraded1(id int)");
+$publisher->safe_psql('postgres', "CREATE TABLE tab_upgraded2(id int)");
+$old_sub->safe_psql('postgres', "CREATE TABLE tab_upgraded1(id int)");
+$old_sub->safe_psql('postgres', "CREATE TABLE tab_upgraded2(id int)");

IMO it is tidier to combine multiple DDLS whenever you can.

~~~

4.
+# Create a subscription in enabled state before upgrade
+$publisher->safe_psql('postgres', "CREATE PUBLICATION regress_pub1");
+$old_sub->safe_psql('postgres',
+ "CREATE SUBSCRIPTION regress_sub1 CONNECTION '$connstr' PUBLICATION
regress_pub1"
+);
+$old_sub->wait_for_subscription_sync($publisher, 'regress_sub1');

That publication has an empty set of tables. Should there be some
comment to explain why it is OK like this?

~~~

5.
+# Wait till the table tab_upgraded1 reaches 'ready' state
+my $synced_query =
+ "SELECT count(1) = 1 FROM pg_subscription_rel WHERE srsubstate = 'r'";
+$old_sub->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for the table to reach ready state";
+
+$publisher->safe_psql('postgres',
+ "INSERT INTO tab_upgraded1 VALUES (generate_series(1,50))"
+);
+$publisher->wait_for_catchup('regress_sub2');

IMO better without the blank line, so then everything more clearly
belongs to this same comment.

~~~

6.
+# Pre-setup for preparing subscription table in init state. Add tab_upgraded2
+# to the publication.
+$publisher->safe_psql('postgres',
+ "ALTER PUBLICATION regress_pub2 ADD TABLE tab_upgraded2");
+
+$old_sub->safe_psql('postgres',
+ "ALTER SUBSCRIPTION regress_sub2 REFRESH PUBLICATION");

Ditto. IMO better without the blank line, so then everything more
clearly belongs to this same comment.

~~~

7.
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync', '-d', $old_sub->data_dir,
+ '-D', $new_sub->data_dir, '-b', $oldbindir,
+ '-B', $newbindir, '-s', $new_sub->host,
+ '-p', $old_sub->port, '-P', $new_sub->port,
+ $mode
+ ],
+ 'run of pg_upgrade for old instance when the subscription tables are
in init/ready state'
+);

Maybe those 'command_ok' args can be formatted neatly (like you've
done later for the 'command_checks_all').

~~~

8.
+# ------------------------------------------------------
+# Check that the data inserted to the publisher when the subscriber
is down will
+# be replicated to the new subscriber once the new subscriber is started.
+# ------------------------------------------------------

8a.
SUGGESTION
...when the new subscriber is down will be replicated once it is started.

~

8b.
I thought this main comment should also say something like "Also check
that the old subscription states and relations origins are all
preserved."

~~~

9.
+$publisher->safe_psql('postgres', "INSERT INTO tab_upgraded1 VALUES(51)");
+$publisher->safe_psql('postgres', "INSERT INTO tab_upgraded2 VALUES(1)");

IMO it is tidier to combine multiple DDLS whenever you can.

~~~

10.
+# The subscription's running status should be preserved
+$result =
+ $new_sub->safe_psql('postgres',
+ "SELECT subenabled FROM pg_subscription ORDER BY subname");
+is($result, qq(t
+f),
+ "check that the subscription's running status are preserved"
+);

I felt this was a bit too tricky. It might be more readable to do 2
separate SELECTs with explicit subnames. Alternatively, leave the code
as-is but improve the comment to explicitly say something like:

# old subscription regress_sub was enabled
# old subscription regress_sub1 was disabled

~~~

11.
+# ------------------------------------------------------
+# Check that pg_upgrade fails when max_replication_slots configured in the new
+# cluster is less than number of subscriptions in the old cluster.
+# ------------------------------------------------------
+my $new_sub1 = PostgreSQL::Test::Cluster->new('new_sub1');
+$new_sub1->init;
+$new_sub1->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+$old_sub->stop;

/than number/than the number/

Should that old_sub->stop have been part of the previous cleanup steps?

~~~

12.
+$old_sub->start;
+
+# Drop the subscription
+$old_sub->safe_psql('postgres', "DROP SUBSCRIPTION regress_sub2");

Maybe it is tidier puttin that 'start' below the comment.

~~~

13.
+# ------------------------------------------------------
+# Check that pg_upgrade refuses to run in:
+# a) if there's a subscription with tables in a state other than 'r' (ready) or
+# 'i' (init) and/or
+# b) if the subscription has no replication origin.
+# ------------------------------------------------------

13a.
/refuses to run in:/refuses to run if:/

~

13b.
/a) if/a)/

~

13c.
/b) if/b)/

~~~

14.
+# Create another subscription and drop the subscription's replication origin
+$old_sub->safe_psql('postgres',
+ "CREATE SUBSCRIPTION regress_sub4 CONNECTION '$connstr' PUBLICATION
regress_pub3 WITH (enabled=false)"
+);
+
+my $subid = $old_sub->safe_psql('postgres',
+ "SELECT oid FROM pg_subscription WHERE subname = 'regress_sub4'");
+my $reporigin = 'pg_' . qq($subid);
+
+# Drop the subscription's replication origin
+$old_sub->safe_psql('postgres',
+ "SELECT pg_replication_origin_drop('$reporigin')");
+
+$old_sub->stop;

14a.
IMO better to have all this without blank lines, because it all
belongs to the first comment.

~

14b.
That 2nd comment "# Drop the..." is not required because the first
comment already says the same.

======
src/include/catalog/pg_subscription_rel.h

15.
extern void AddSubscriptionRelState(Oid subid, Oid relid, char state,
- XLogRecPtr sublsn);
+ XLogRecPtr sublsn, bool upgrade);

Shouldn't this 'upgrade' really be 'binary_upgrade' so it better
matches the comment you added in that function?

If you agree, then change it here and also in the function definition.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2023-12-01 05:47:29 RE: Synchronizing slots from primary to standby
Previous Message Justin Pryzby 2023-12-01 05:13:25 processes stuck in shutdown following OOM/recovery