From: | Nikolay Samokhvalov <nik(at)postgres(dot)ai> |
---|---|
To: | Michael Banck <mbanck(at)gmx(dot)net> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
Subject: | Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption? |
Date: | 2023-07-10 21:37:24 |
Message-ID: | CAM527d8pSQ3pa1u5GocxpGNuY2yteHhTRKuwGPPUJC0V9Qzf5Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jul 10, 2023 at 2:02 PM Michael Banck <mbanck(at)gmx(dot)net> wrote:
> Thanks for that!
>
Thanks for the fast review.
>
> > I agree with Andrey – without it, we don't have any good way to upgrade
> > large clusters in short time. Default rsync mode (without "--size-only")
> > takes a lot of time too, if the load is heavy.
> >
> > With these adjustments, can "rsync --size-only" remain in the docs as the
> > *fast* and safe method to upgrade standbys, or there are still some
> > concerns related to corruption risks?
>
> I hope somebody can answer that definitively, but I read Stephen's mail
> to indicate that this procedure should be safe in principle (if you know
> what you are doing).
>
right, this is my understanding too
> > +++ b/doc/src/sgml/ref/pgupgrade.sgml
> > @@ -380,22 +380,28 @@ NET STOP postgresql-&majorversion;
> > </para>
> >
> > <para>
> > - Streaming replication and log-shipping standby servers can
> > + Streaming replication and log-shipping standby servers must
> > remain running until a later step.
> > </para>
> > </step>
> >
> > - <step>
> > + <step id="pgupgrade-step-prepare-standbys">
> >
> > <para>
> > - If you are upgrading standby servers using methods outlined in
> section <xref
> > - linkend="pgupgrade-step-replicas"/>, verify that the old standby
> > - servers are caught up by running
> <application>pg_controldata</application>
> > - against the old primary and standby clusters. Verify that the
> > - <quote>Latest checkpoint location</quote> values match in all
> clusters.
> > - (There will be a mismatch if old standby servers were shut down
> > - before the old primary or if the old standby servers are still
> running.)
> > + If you are upgrading standby servers using methods outlined in
> > + <xref linkend="pgupgrade-step-replicas"/>,
>
> You dropped the "section" before the xref, I think that should be kept
> around.
>
Seems to be a problem in discussing source code that looks quite different
than the final result.
In the result – the docs – we currently have "section Step 9", looking
weird. I still think it's good to remove it. We also have "in Step 17
below" (without the extra word "section") in a different place on the same
page.
>
> > + ensure that they were
> running when
> > + you shut down the primaries in the previous step, so all the
> latest changes
>
> You talk of primaries in plural here, that is a bit weird for PostgreSQL
> documentation.
>
The same docs already discuss two primaries ("8. Stop both primaries"), but
I agree it might look confusing if you read only a part of the doc, jumping
into middle of it, like I do all the time when using the docs in "check the
reference" style.
I agree with this comment, but it tells me we need even more improvements
of this doc, beyond my original goal – e.g., I don't like section 8 saying
"Make sure both database servers", it should be "both primaries".
>
> > + and the shutdown checkpoint record were received. You can verify
> this by running
> > + <application>pg_controldata</application> against the old primary
> and standby
> > + clusters. The <quote>Latest checkpoint location</quote> values
> must match in all
> > + nodes. A mismatch might occur if old standby servers were shut
> down before
> > + the old primary. To fix a mismatch, start all old servers and
> return to the
> > + previous step; proceeding with mismatched
> > + <quote>Latest checkpoint location</quote> may lead to standby
> corruption.
> > + </para>
> > +
> > + <para>
> > Also, make sure <varname>wal_level</varname> is not set to
> > <literal>minimal</literal> in the
> <filename>postgresql.conf</filename> file on the
> > new primary cluster.
> > @@ -497,7 +503,6 @@ pg_upgrade.exe
> > linkend="warm-standby"/>) standby servers, you can follow these
> steps to
> > quickly upgrade them. You will not be running
> <application>pg_upgrade</application> on
> > the standby servers, but rather <application>rsync</application>
> on the primary.
> > - Do not start any servers yet.
> > </para>
> >
> > <para>
> > @@ -508,6 +513,15 @@ pg_upgrade.exe
> > is running.
> > </para>
> >
> > + <para>
> > + Before running rsync, to avoid standby corruption, it is absolutely
> > + critical to ensure that both primaries are shut down and standbys
> > + have received the last changes (see <xref
> linkend="pgupgrade-step-prepare-standbys"/>).
>
> I think this should be something like "ensure both that the primary is
> shut down and that the standbys have received all the changes".
>
Well, we have two primary servers – old and new. I tried to clarify it in
the new version.
>
> > + Standbys can be running at this point or fully stopped.
>
> "or be fully stopped." I think.
>
> > + If they
> > + are still running, you can stop, upgrade, and start them one by
> one; this
> > + can be useful to keep the cluster open for read-only transactions.
> > + </para>
>
> Maybe this is clear from the context, but "upgrade" in the above should
> maybe more explicitly refer to the rsync method or else people might
> think one can run pg_upgrade on them after all?
>
Maybe. It will require changes in other parts of this doc.
Thinking (here:
https://gitlab.com/postgres/postgres/-/merge_requests/18/diffs)
Meanwhile, attached is v2
thanks for the comments
Attachment | Content-Type | Size |
---|---|---|
pg-upgrade-docs-clarify-rsync-size-only-002.patch | application/x-patch | 4.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2023-07-10 21:54:23 | Re: check_strxfrm_bug() |
Previous Message | Tom Lane | 2023-07-10 21:36:43 | Re: Allowing parallel-safe initplans |