Re: Impact of checkpointer during pg_upgrade

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: Impact of checkpointer during pg_upgrade
Date: 2023-09-07 10:12:29
Message-ID: CAFiTN-ti7GL9v7ZVqjJfTV4QOqR0pZFfh_XPK6bxC4_tX5Z1dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 7, 2023 at 3:34 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Sep 7, 2023 at 12:55 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > (Just dumping what I have in mind while reading the thread.)
> >
> > On Sat, Sep 02, 2023 at 10:08:51AM +0530, Amit Kapila wrote:
> > > During pg_upgrade, we start the server for the old cluster which can
> > > allow the checkpointer to remove the WAL files. It has been noticed
> > > that we do generate certain types of WAL records (e.g
> > > XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT)
> > > even during pg_upgrade for old cluster, so additional WAL records
> > > could let checkpointer decide that certain WAL segments can be removed
> > > (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate
> > > the slots. Currently, I can't see any problem with this but for future
> > > work where we want to migrate logical slots during an upgrade[1], we
> > > need to decide what to do for such cases. The initial idea we had was
> > > that if the old cluster has some invalid slots, we won't allow an
> > > upgrade unless the user removes such slots or uses some option like
> > > --exclude-slots. It is quite possible that slots got invalidated
> > > during pg_upgrade due to no user activity. Now, even though the
> > > possibility of the same is less I think it is worth considering what
> > > should be the behavior.
> > >
> > > The other possibilities apart from not allowing an upgrade in such a
> > > case could be (a) Before starting the old cluster, we fetch the slots
> > > directly from the disk using some tool like [2] and make the decisions
> > > based on that state; (b) During the upgrade, we don't allow WAL to be
> > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots
> > > as well but for that, we need to expose an API to invalidate the
> > > slots; (d) somehow distinguish the slots that are invalidated during
> > > an upgrade and then simply copy such slots because anyway we ensure
> > > that all the WAL required by slot is sent before shutdown.
> >
> > Checking for any invalid slots at the beginning of the upgrade and
> > complain sounds like a good thing to do, because these are not
> > expected to begin with, no? That's similar to a pre-check requirement
> > that the slots should have fed the subscribers with all the data
> > available up to the shutdown checkpoint when the publisher was stopped
> > before running pg_upgrade. So (a) is a good idea to prevent an
> > upgrade based on a state we don't expect from the start, as something
> > in check.c, I assume.
> >
> > On top of that, (b) sounds like a good idea to me anyway to be more
> > defensive. But couldn't we do that just like we do for autovacuum and
> > force the GUCs that could remove the slot's WAL to not do their work
> > instead?
> >
>
> I think if we just make max_slot_wal_keep_size to -1 that should be
> sufficient to not let any slots get invalidated during upgrade. Do you
> have anything else in mind?

This seems like a good solution to the problem.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-09-07 10:41:09 Re: persist logical slots to disk during shutdown checkpoint
Previous Message Ashutosh Bapat 2023-09-07 10:08:07 Re: persist logical slots to disk during shutdown checkpoint