Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-09-14 04:30:00
Message-ID: CAA4eK1LS3OycoTQMhDgoonpPasG+FE4v1GwKh4+gKuZf1tA-Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 14, 2023 at 9:21 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Sep 14, 2023 at 8:40 AM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> >
> > Here are some more ideas about the issue for reference.
> >
> > 1) Extending the controlfile.
> >
> > We can dd a new field (e.g. non_upgrade_checkPoint) to record the last check point
> > ptr happened in non-upgrade mode. The new field won't be updated due to
> > "pg_upgrade --check", so pg_upgrade can use this LSN to compare with the slot's
> > confirmed_flush_lsn.
> >
> > Pros: User can smoothly upgrade the cluster even if they run "pg_upgrade
> > --check" in advance.
> >
> > Cons: Not sure if this is a enough reason to introduce new field in
> > controlfile.
>
> Yeah, this could be an option but I am not sure either that adding a
> new option for this purpose is the best way.
>

I also think so. It seems this could work but adding upgrade-specific
information to other data structures doesn't sound like a clean
solution.

> > -----------
> >
> > 2) Advance the slot's confirmed_flush_lsn in pg_upgrade if the check passes.
> >
> > Introducing an upgrade support SQL function
> > (binary_upgrade_advance_logical_slot_lsn()) to set a
> > flag(catch_confirmed_lsn_up) on server side. On server side, when trying to
> > flush the slot in shutdown checkpoint(CheckPointReplicationSlots), we update
> > the slot's confirmed_flush_lsn to the lsn of the current checkpoint if
> > catch_confirmed_lsn_up is set.
> >
> > Pros: User can smoothly upgrade the cluster even if they run "pg_upgrade
> > --check" in advance.
> >
> > Cons: Although we have some examples for using functions
> > (binary_upgrade_set_next_pg_enum_oid ...) to set some variables during upgrade
> > , but not sure if it's a standard behavior to change the slot's lsn during
> > upgrade.
>
> I feel this seems like a good option.
>

In this idea, if the user decides not to proceed after the upgrade
--check, then we would have incremented the confirmed_flush location
of all slots without the subscriber's acknowledgment. It may not be
the usual scenario but in theory, it may violate our basic principle
of incrementing confirmed_flush location. Another thing to consider is
we have to do this for all logical slots under the assumption that all
are already caught up as pg_upgrade would have ensured that. So,
ideally, the server should have some knowledge that the slots are
already caught up to the latest location which again doesn't seem like
a clean idea.

> > -----------
> >
> > 3) Introduce a new pg_upgrade option(e.g. skip_slot_check), and suggest if user
> > already did the upgrade check for stopped server, they can use this option
> > when trying to upgrade later.
> >
> > Pros: Can save some efforts for user to advance each slot's lsn.
> >
> > Cons: I didn't see similar options in pg_upgrade, might need some agreement.
>
> Yeah right, in fact during the --check command we can give that
> suggestion as well.
>

Hmm, we can't mandate users to skip checking slots because that is the
whole point of --check slots.

> I feel option 2 looks best to me unless there is some design issue to
> that, as of now I do not see any issue with that though. Let's see
> what others think.
>

By the way, did you consider the previous approach this patch was
using? Basically, instead of getting the last checkpoint location from
the control file, we will read the WAL file starting from the
confirmed_flush location of a slot and if we find any WAL other than
expected WALs like shutdown checkpoint, running_xacts, etc. then we
will error out.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-09-14 04:33:39 Re: Have better wording for snapshot file reading failure
Previous Message vignesh C 2023-09-14 04:27:15 Re: CHECK Constraint Deferrable