Quick Links

Re: Impact of checkpointer during pg_upgrade

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	Re: Impact of checkpointer during pg_upgrade
Date:	2023-09-04 08:11:48
Message-ID:	CAFiTN-tzqwtBsdaCTDdj5kkEUbsho_FiRmLT7x-G_cSxrKwJ+Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Sep 4, 2023 at 11:18 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Sep 4, 2023 at 10:33 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Mon, Sep 4, 2023 at 8:41 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Sat, Sep 2, 2023 at 6:12 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >
> > > > On Sat, Sep 2, 2023 at 10:09 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > > The other possibilities apart from not allowing an upgrade in such a
> > > > > case could be (a) Before starting the old cluster, we fetch the slots
> > > > > directly from the disk using some tool like [2] and make the decisions
> > > > > based on that state;
> > > >
> > > > Okay, so IIUC along with dumping the slot data we also need to dump
> > > > the latest checkpoint LSN because during upgrade we do check that the
> > > > confirmed flush lsn for all the slots should be the same as the latest
> > > > checkpoint. Yeah but I think we could work this out.
> > > >
> > > We already have the latest checkpoint LSN information from
> > > pg_controldata. I think we can use that as the patch proposed in the
> > > thread [1] is doing now. Do you have something else in mind?
> >
> > I think I did not understood the complete proposal. And what I meant
> > is that if we dump the slot before we start the cluster thats fine.
> > But then later after starting the old cluster if we run some query
> > like we do in check_old_cluster_for_valid_slots() then thats not
> > right, because there is gap between the status of the slots what we
> > dumped before starting the cluster and what we are checking after the
> > cluster, so there is not point of that check right?
> >
>
> That's right but if we do read slots from disk, we preserve those in
> the memory and use that information instead of querying it again in
> check_old_cluster_for_valid_slots().
>
> > > > (b) During the upgrade, we don't allow WAL to be
> > > > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots
> > > > > as well but for that, we need to expose an API to invalidate the
> > > > > slots;
> > > >
> > > > (d) somehow distinguish the slots that are invalidated during
> > > > > an upgrade and then simply copy such slots because anyway we ensure
> > > > > that all the WAL required by slot is sent before shutdown.
> > > >
> > > > Yeah this could also be an option, although we need to think the
> > > > mechanism of distinguishing those slots looks clean and fit well with
> > > > other architecture.
> > > >
> > >
> > > If we want to do this we probably need to maintain a flag in the slot
> > > indicating that it was invalidated during an upgrade and then use the
> > > same flag in the upgrade to check the validity of slots. I think such
> > > a flag needs to be maintained at the same level as
> > > ReplicationSlotInvalidationCause to avoid any inconsistency among
> > > those.
> >
> > I think we can do better, like we can just read the latest
> > checkpoint's LSN before starting the old cluster. And now while
> > checking the slot can't we check if the the slot is invalidated then
> > their confirmed_flush_lsn >= the latest_checkpoint_lsn we preserved
> > before starting the cluster because if so then those slot might have
> > got invalidated during the upgrade no?
> >
>
> Isn't that possible only if we update confirmend_flush LSN while
> invalidating? Otherwise, how the check you are proposing can succeed?

I am not suggesting to compare the confirmend_flush_lsn to the latest
checkpoint LSN instead I am suggesting that before starting the
cluster we get the location of the latest checkpoint LSN that should
be the shutdown checkpoint LSN. So now also in [1] we check that
confirmed flush lsn should be equal to the latest checkpoint lsn. So
the only problem is that after we restart the cluster during the
upgrade we might invalidate some of the slots which are perfectly fine
to migrate and we want to identify those slots. So if we know the the
LSN of the shutdown checkpoint before the cluster started then we can
perform a additional checks on all the invalidated slots that their
confirmed lsn >= shutdown checkpoint lsn we preserved before
restarting the cluster (not the latest checkpoint lsn) then those
slots got invalidated only after we started the cluster for upgrade?
Is there any loophole in this theory? This theory is based on the
assumption that the confirmed flush lsn are not moving forward for the
already invalidated slots that means the slot which got invalidated
before we shutdown for upgrade will have confirm flush lsn value <
shutdown checkpoint and the slots which got invalidated during the
upgrade will have confirm flush lsn at least equal to the shutdown
checkpoint.

[1] https://www.postgresql.org/message-id/TYAPR01MB5866F7D8ED15BA1E8E4A2AB0F5E4A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: Impact of checkpointer during pg_upgrade at 2023-09-04 05:48:24 from Amit Kapila

Responses

Re: Impact of checkpointer during pg_upgrade at 2023-09-04 10:48:51 from Dilip Kumar

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dilip Kumar	2023-09-04 08:37:03	Re: Optimize planner memory consumption for huge arrays
Previous Message	Michael Paquier	2023-09-04 07:12:32	Re: Unlogged relation copy is not fsync'd