Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2025-01-17 13:20:41
Message-ID: CANhcyEVknNXL6B=bFWjvo6Z9g4z3WrGELmPbmdKGw7UjTWqKfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 16 Jan 2025 at 12:35, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
>
> On Wed, Jan 15, 2025 at 11:37 AM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> >
> > On Thu, 2 Jan 2025 at 15:57, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Jan 2, 2025 at 8:16 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > > > Hi Nisha,
> > > >
> > > > Here are some minor review comments for patch v58-0002.
> > > >
> > >
> > > Thank you for your feedback! Please find the v59 patch set addressing
> > > all the comments.
> > > Note: There are no new changes in patch-0001.
> > >
> >
> > Hi Nisha,
> > I reviewed the v59-0001 patch. I have few comments:
> >
> > 1.I think we should update the comment for function
> > 'InvalidatePossiblyObsoleteSlot'
> > Currently the comment is like:
> >
> > /*
> > * Helper for InvalidateObsoleteReplicationSlots
> > *
> > * Acquires the given slot and mark it invalid, if necessary and possible.
> > *
> > * Returns whether ReplicationSlotControlLock was released in the interim (and
> > * in that case we're not holding the lock at return, otherwise we are).
> > *
> > * Sets *invalidated true if the slot was invalidated. (Untouched otherwise.)
> > *
> > * This is inherently racy, because we release the LWLock
> > * for syscalls, so caller must restart if we return true.
> > */
> >
> > I think we should add a comment for the case 'when slot is already ours'.
>
> Done.
>
> > 2. Similarly we should update comment here:
> > /*
> > * Check if the slot needs to be invalidated. If it needs to be
> > * invalidated, and is not currently acquired, acquire it and mark it
> > * as having been invalidated. We do this with the spinlock held to
> > * avoid race conditions -- for example the restart_lsn could move
> > * forward, or the slot could be dropped.
> > */
> > SpinLockAcquire(&s->mutex);
> >
> > Before we release the lock, we are marking the slot as invalidated for
> > the case when the slot is already acquired by our process. So we
> > should update it in comment.
> >
>
> Clarified the comments as per the mentioned case.
>
> > 3. I think we should also update the following 'if condition':
> >
> > if (active_pid != 0)
> > {
> > /*
> > * Prepare the sleep on the slot's condition variable before
> > * releasing the lock, to close a possible race condition if the
> > * slot is released before the sleep below.
> > */
> > We should not enter the if condition for the case when the slot was
> > already acquired by our process.
> >
>
> Thank you for pointing that out. I've included the fix and also
> reorganized this section of the code in patch-0001 to improve the
> readability of the logic.
>
> Attached the v60 patch set addressing above comments and all other
> comments at [1].
>
> [1] https://www.postgresql.org/message-id/CALDaNm2r969ZZPDaAZQEtxcfL-sGUW8AGdbdwC8AcMn1V8w%2Bhw%40mail.gmail.com
>
Hi Nisha,

Thanks for providing an updated patch. I have tested the patch and ran
some tests. The patch works fine. I have few comments:

v60-0001 patch:
1) There is extra line before 'SpinLockRelease(&s->mutex)' :

+ else /* Some other process owns the slot */
{
+
+ SpinLockRelease(&s->mutex);

v60-0002 patch:
1) In the comment:

/*
* Invalidate slots that require resources about to be removed.
*
* Returns true when any slot have got invalidated.
*
* Whether a slot needs to be invalidated depends on the cause. A slot is
* removed if it:
* - RS_INVAL_WAL_REMOVED: requires a LSN older than the given segment
* - RS_INVAL_HORIZON: requires a snapshot <= the given horizon in the given
* db; dboid may be InvalidOid for shared relations
* - RS_INVAL_WAL_LEVEL: is logical
* - RS_INVAL_IDLE_TIMEOUT: idle slot timeout has occurred
*
* NB - this runs as part of checkpoint, so avoid raising errors if possible.
*/

It is mentioned that 'A slot is removed if it:'. I think instead of
'removed' it should be 'invalidated'.

Thanks and regards,
Shlok Kyal

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2025-01-17 13:41:51 Re: Re: proposal: schema variables
Previous Message Amul Sul 2025-01-17 12:50:15 Re: NOT ENFORCED constraint feature