Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-03-20 02:28:20
Message-ID: CAA4eK1Ly8mj_fvsVr=i5yzpiVDKcjCik3nPNhSYfy6DJqUm4Ew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 19, 2024 at 6:12 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> On Tue, Mar 19, 2024 at 04:20:35PM +0530, Amit Kapila wrote:
> > On Tue, Mar 19, 2024 at 3:11 PM Bertrand Drouvot
> > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Mar 19, 2024 at 10:56:25AM +0530, Amit Kapila wrote:
> > > > On Mon, Mar 18, 2024 at 8:19 PM Bertrand Drouvot
> > > > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > > > > Agree. While it makes sense to invalidate slots for wal removal in
> > > > > CreateCheckPoint() (because this is the place where wal is removed), I 'm not
> > > > > sure this is the right place for the 2 new cases.
> > > > >
> > > > > Let's focus on the timeout one as proposed above (as probably the simplest one):
> > > > > as this one is purely related to time and activity what about to invalidate them
> > > > > when?:
> > > > >
> > > > > - their usage resume
> > > > > - in pg_get_replication_slots()
> > > > >
> > > > > The idea is to invalidate the slot when one resumes activity on it or wants to
> > > > > get information about it (and among other things wants to know if the slot is
> > > > > valid or not).
> > > > >
> > > >
> > > > Trying to invalidate at those two places makes sense to me but we
> > > > still need to cover the cases where it takes very long to resume the
> > > > slot activity and the dangling slot cases where the activity is never
> > > > resumed.
> > >
> > > I understand it's better to have the slot reflecting its real status internally
> > > but it is a real issue if that's not the case until the activity on it is resumed?
> > > (just asking, not saying we should not)
> > >
> >
> > Sorry, I didn't understand your point. Can you try to explain by example?
>
> Sorry if that was not clear, let me try to rephrase it first: what issue to you
> see if the invalidation of such a slot occurs only when its usage resume or
> when pg_get_replication_slots() is triggered? I understand that this could lead
> to the slot not being invalidated (maybe forever) but is that an issue for an
> inactive slot?
>

It has the risk of preventing WAL and row removal. I think this is the
primary reason we are at the first place planning to have such a
parameter. So, we should have some way to invalidate it even when the
walsender/backend process doesn't use it again.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-03-20 03:19:30 Re: PostgreSQL 17 Release Management Team & Feature Freeze
Previous Message Thomas Munro 2024-03-20 02:12:49 Re: Regression tests fail with musl libc because libpq.so can't be loaded