RE: [PoC] pg_upgrade: allow to upgrade publisher node

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Dilip Kumar' <dilipbalaut(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-09-11 13:21:39
Message-ID: TYAPR01MB586642D33208D190F67CDD7BF5F2A@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Dilip,

Thank you for reviewing! PSA new version.

>
> 1.
> Note that slot restoration must be done after the final pg_resetwal command
> during the upgrade because pg_resetwal will remove WALs that are required by
> the slots. Due to this restriction, the timing of restoring replication slots is
> different from other objects.
>
> This comment in the commit message is confusing. I understand the
> reason but from this, it is not very clear that if resetwal removes
> the WAL we needed then why it is good to create after the resetwal. I
> think we should make it clear that creating new slot will set the
> restart lsn to current WAL location and after that resetwal can remove
> those WAL where slot restart lsn is pointing....

Just to confirm - WAL records must not be removed in any time if it is referred
as restart_lsn. The reason why the slot creation is done after pg_restwal is that
required WALs are not removed by the command. See [1].
Moreover, clarified more in the commit message.

> 2.
>
> + <itemizedlist>
> + <listitem>
> + <para>
> + All slots on the old cluster must be usable, i.e., there are no slots
> + whose
> + <link
> linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>
> wal_status</structfield>
> + is <literal>lost</literal>.
> + </para>
> + </listitem>
> + <listitem>
> + <para>
> + <link
> linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>c
> onfirmed_flush_lsn</structfield>
> + of all slots on the old cluster must be the same as the latest
> + checkpoint location.
> + </para>
> + </listitem>
> + <listitem>
> + <para>
> + The output plugins referenced by the slots on the old cluster must be
> + installed in the new PostgreSQL executable directory.
> + </para>
> + </listitem>
> + <listitem>
> + <para>
> + The new cluster must have
> + <link
> linkend="guc-max-replication-slots"><varname>max_replication_slots</varna
> me></link>
> + configured to a value greater than or equal to the number of slots
> + present in the old cluster.
> + </para>
> + </listitem>
> + <listitem>
> + <para>
> + The new cluster must have
> + <link
> linkend="guc-wal-level"><varname>wal_level</varname></link> as
> + <literal>logical</literal>.
> + </para>
> + </listitem>
> + </itemizedlist>
>
> I think we should also add that the new slot should not have any
> permanent existing logical replication slot.

Hmm, I wondered it should be really needed. Tables are required not to be in the
new cluster too, but not documented. It might be a trivial thing. Anyway, added.

FYI - the restriction was not introduced by the patch. I reported independently [2],
but no one has responded since now...

> 3.
> - with the primary.) Replication slots are not copied and must
> - be recreated.
> + with the primary.) Replication slots on the old standby are not copied.
> + Only logical slots on the primary are migrated to the new standby,
> + and other slots must be recreated.
>
> This paragraph should be rephrased. I mean first stating that
> "Replication slots on the old standby are not copied" and then saying
> Only logical slots are migrated doesn't seem like the best way. Maybe
> we can just say "Only logical slots on the primary are migrated to the
> new standby, and other slots must be recreated."

Per discussion on [3], I used another words. Thanks for suggesting.

> 4.
> + /*
> + * Raise an ERROR if the logical replication slot is invalidating. It
> + * would not happen because max_slot_wal_keep_size is set to -1 during
> + * the upgrade, but it stays safe.
> + */
> + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> + elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
>
> Rephrase the first line as -> Raise an ERROR if the logical
> replication slot is invalidating during an upgrade.

Per discussion on [3], I used another words. Thanks for suggesting.

> 5.
> + /* Logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> + return;
>
>
> For readability change this to if
> (GET_MAJOR_VERSION(old_cluster.major_version) < 1700), because in most
> of the checks related to this, we are using 1700 so better be
> consistent in this.

Per discussion on [3], I did not change here.

> 6.
> + if (nslots_on_new)
> + pg_fatal(ngettext("New cluster must not have logical replication
> slots but found %d slot.",
> + "New cluster must not have logical replication slots but found %d slots.",
> + nslots_on_new),
> + nslots_on_new);
> ...
> + if (PQntuples(res) != 1)
> + pg_fatal("could not determine wal_level.");
> +
> + wal_level = PQgetvalue(res, 0, 0);
> +
> + if (strcmp(wal_level, "logical") != 0)
> + pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
> + wal_level);
>
>
> I have noticed that the case of the first letter in the pg_fatal
> message is not consistent.

Actually there are some inconsistency even in the check.c file, so I devised
below rules. How do you think?

* Non-complete sentence starts with the lower case.
(e.g., "could not open", "could not determine")
* proper nouns are always noted with the lower cases
(e.g., "template0 must not allow...", "wal_level must be...").
* Other than above, the sentence starts with the upper case.

> 7.
> +
> + /* Is the slot still usable? */
> + if (slot->invalid)
> + {
>
> Why comment says "Is the slot still usable?" I think it should be "Is
> the slot usable?" otherwise it appears that we have first fetched the
> slots and now we are refetching it and checking whether it is still
> usable.

Changed.

[1]: https://www.postgresql.org/message-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/TYAPR01MB5866D277F6BEDEA4223B3559F5E6A@TYAPR01MB5866.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CAFiTN-vs53SqZiZN1GcSuKLmMY%3D0d14wJDDm1aKmoBONwnqaGg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v35-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch application/octet-stream 10.2 KB
v35-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch application/octet-stream 39.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-09-11 13:21:47 RE: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Himanshu Upadhyaya 2023-09-11 12:15:14 Re: CHECK Constraint Deferrable