From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com> |
Subject: | Re: [PoC] pg_upgrade: allow to upgrade publisher node |
Date: | 2023-08-17 12:36:30 |
Message-ID: | CAD21AoBT3BCzafjHvH+=SN23j1DTdNJ=LE_iKDtKwkrGuXF0Sg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 15, 2023 at 12:06 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint LSN in the control file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the slot has consumed all WAL records before the last shutdown. We don't need to worry about WAL records generated after starting the old cluster during the upgrade, at least for logical replication slots.
> > > >
> > >
> > > Right, this is somewhat closer to what Patch is already doing. But
> > > remember in this case we need to remember and use the latest
> > > checkpoint from the control file before the old cluster is started
> > > because otherwise the latest checkpoint location could be even updated
> > > during the upgrade. So, instead of reading from WAL, we need to change
> > > so that we rely on the control file's latest LSN.
> >
> > Yes, I was thinking the same idea.
> >
> > But it works for only replication slots for logical replication. Do we
> > want to check if no meaningful WAL records are generated after the
> > latest shutdown checkpoint, for manually created slots (or non-logical
> > replication slots)? If so, we would need to have something reading WAL
> > records in the end.
> >
>
> This feature only targets logical replication slots. I don't see a
> reason to be different for manually created logical replication slots.
> Is there something particular that you think we could be missing?
Sorry I was not clear. I meant the logical replication slots that are
*not* used by logical replication, i.e., are created manually and used
by third party tools that periodically consume decoded changes. As we
discussed before, these slots will never be able to pass that
confirmed_flush_lsn check. After some thoughts, one thing we might
need to consider is that in practice, the upgrade project is performed
during the maintenance window and has a backup plan that revert the
upgrade process, in case something bad happens. If we require the
users to drop such logical replication slots, they cannot resume to
use the old cluster in that case, since they would need to create new
slots, missing some changes. Other checks in pg_upgrade seem to be
compatibility checks that would eventually be required for the upgrade
anyway. Do we need to consider this case? For example, we do that
confirmed_flush_lsn check for only the slots with pgoutput plugin.
> > >
> > > Yet another thing I am trying to consider is whether we can allow to
> > > upgrade slots from 16 or 15 to later versions. As of now, the patch
> > > has the following check:
> > > getLogicalReplicationSlots()
> > > {
> > > ...
> > > + /* Check whether we should dump or not */
> > > + if (fout->remoteVersion < 170000)
> > > + return;
> > > ...
> > > }
> > >
> > > If we decide to use the existing view pg_replication_slots then can we
> > > consider upgrading slots from the prior version to 17? Now, if we want
> > > to invent any new API similar to pg_replslotdata then we can't do this
> > > because it won't exist in prior versions but OTOH using existing view
> > > pg_replication_slots can allow us to fetch slot info from older
> > > versions as well. So, I think it is worth considering.
> >
> > I think that without 0001 patch the replication slots will not be able
> > to pass the confirmed_flush_lsn check.
> >
>
> Right, but we can think of backpatching the same. Anyway, we can do
> that as a separate work by starting a new thread to see if there is a
> broader agreement for backpatching such a change. For now, we can
> focus on >=v17.
>
Agreed.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2023-08-17 13:31:02 | Re: [PoC] pg_upgrade: allow to upgrade publisher node |
Previous Message | Antonin Houska | 2023-08-17 12:25:40 | Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables) |