Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-08-30 05:27:55
Message-ID: CAHut+Pv=O-kn24Jnr4ingv3nesL_=G9jhCcQhv5HG7ktJvi5aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Kuroda-san.

Here are some review comments for v28-0003.

======
src/bin/pg_upgrade/check.c

1. check_and_dump_old_cluster
+ /*
+ * Logical replication slots can be migrated since PG17. See comments atop
+ * get_old_cluster_logical_slot_infos().
+ */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+ check_old_cluster_for_valid_slots(live_check);
+

IIUC we are preferring to use the <= 1600 style of version check
instead of >= 1700 where possible.

So this comment and version check ought to be removed from here, and
done inside check_old_cluster_for_valid_slots() instead.

~~~

2. check_old_cluster_for_valid_slots

+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ * - All logical replication slots are usable.
+ * - All logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)

I suggested in the previous comment above (#1) that the version check
should be moved into this function.

Therefore, this function comment now should also mention slot upgrade
is only allowed for >= PG17

~~~

3.
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+ int i,
+ ntups,
+ i_slotname;
+ PGresult *res;
+ DbInfo *active_db = &old_cluster.dbarr.dbs[0];
+ PGconn *conn;
+
+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_old_cluster_logical_slots() == 0)
+ return;

3a.
See comment #1. At the top of this function body there should be a
version check like:

if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
return;

~

3b.
/Quick exit/Quick return/

~

4.
+ prep_status("Checking for logical replication slots");

I felt that should add the word "valid" like:
"Checking for valid logical replication slots"

~~~

5.
+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");

Since the SQL is checking if there *are* lost slots I felt it would be
more natural to reverse that comment.

SUGGESTION
/* Check and reject if there are any logical replication slots with a
'lost' state. */

~~~

6.
+ /*
+ * Do additional checks if a live check is not required. This requires
+ * that confirmed_flush_lsn of all the slots is the same as the latest
+ * checkpoint location, but it would be satisfied only when the server has
+ * been shut down.
+ */
+ if (!live_check)

I think the comment can be rearranged slightly:

SUGGESTION
Do additional checks to ensure that 'confirmed_flush_lsn' of all the
slots is the same as the latest checkpoint location.
Note: This can be satisfied only when the old_cluster has been shut
down, so we skip this for "live" checks.

======
src/bin/pg_upgrade/controldata.c

7.
+ /*
+ * Read the latest checkpoint location if the cluster is PG17
+ * or later. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+ {

Fetching this "Latest checkpoint location:" value is only needed for
the check_old_cluster_for_valid_slots validation check, isn't it? But
AFAICT this code is common for both old_cluster and new_cluster.

I am not sure what is best to do:
- Do only the minimal logic needed?
- Read the value redundantly even for new_cluster just to keep code simpler?

Either way, maybe the comment should say something about this.

======
.../t/003_logical_replication_slots.pl

8. Consider adding one more test

Maybe there should also be some "live check" test performed (e.g.
using --check, and a running old_cluster).

This would demonstrate pg_upgrade working successfully even when the
WAL records are not consumed (because LSN checks would be skipped in
check_old_cluster_for_valid_slots function).

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-08-30 06:42:00 [17] CREATE SUBSCRIPTION ... SERVER
Previous Message Andy Fan 2023-08-30 04:47:35 Re: Extract numeric filed in JSONB more effectively