Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-11-19 07:20:51
Message-ID: CABdArM5wR1ea8NpNU+tG0bB1m6G=Lep3bd1uthtSGj64JyqJsw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 14, 2024 at 9:14 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Wed, 13 Nov 2024 at 15:00, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> >
> > Please find the v48 patch attached.
> >
> > On Thu, Sep 19, 2024 at 9:40 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > > When we promote hot standby with synced logical slots to become new
> > > primary, the logical slots are never invalidated with
> > > 'inactive_timeout' on new primary. It seems the check in
> > > SlotInactiveTimeoutCheckAllowed() is wrong. We should allow
> > > invalidation of slots on primary even if they are marked as 'synced'.
> >
> > fixed.
> >
> > > I have raised 4 issues so far on v46, the first 3 are in [1],[2],[3].
> > > Once all these are addressed, I can continue reviewing further.
> > >
> >
> > Fixed issues reported in [1], [2].
>
> Few comments:

Thanks for the review.

>
> 2) Currently it allows a minimum value of less than 1 second like in
> milliseconds, I feel we can have some minimum value at least something
> like checkpoint_timeout:
> diff --git a/src/backend/utils/misc/guc_tables.c
> b/src/backend/utils/misc/guc_tables.c
> index 8a67f01200..367f510118 100644
> --- a/src/backend/utils/misc/guc_tables.c
> +++ b/src/backend/utils/misc/guc_tables.c
> @@ -3028,6 +3028,18 @@ struct config_int ConfigureNamesInt[] =
> NULL, NULL, NULL
> },
>
> + {
> + {"replication_slot_inactive_timeout", PGC_SIGHUP,
> REPLICATION_SENDING,
> + gettext_noop("Sets the amount of time a
> replication slot can remain inactive before "
> + "it will be invalidated."),
> + NULL,
> + GUC_UNIT_S
> + },
> + &replication_slot_inactive_timeout,
> + 0, 0, INT_MAX,
> + NULL, NULL, NULL
> + },
>

Currently, the feature is disabled by default when
replication_slot_inactive_timeout = 0. However, if we set a minimum
value, the default_val cannot be less than min_val, making it
impossible to use 0 to disable the feature.
Thoughts or any suggestions?

>
> 4) I'm not sure if this change required by this patch or is it a
> general optimization, if it is required for this patch we can detail
> the comments:
> @@ -2208,6 +2328,7 @@ RestoreSlotFromDisk(const char *name)
> bool restored = false;
> int readBytes;
> pg_crc32c checksum;
> + TimestampTz now;
>
> /* no need to lock here, no concurrent access allowed yet */
>
> @@ -2368,6 +2489,9 @@ RestoreSlotFromDisk(const char *name)
> NameStr(cp.slotdata.name)),
> errhint("Change \"wal_level\" to be
> \"replica\" or higher.")));
>
> + /* Use same inactive_since time for all slots */
> + now = GetCurrentTimestamp();
> +
> /* nothing can be active yet, don't lock anything */
> for (i = 0; i < max_replication_slots; i++)
> {
> @@ -2400,7 +2524,7 @@ RestoreSlotFromDisk(const char *name)
> * slot from the disk into memory. Whoever acquires
> the slot i.e.
> * makes the slot active will reset it.
> */
> - slot->inactive_since = GetCurrentTimestamp();
> + slot->inactive_since = now;
>

After removing the "ReplicationSlotSetInactiveSince" from here, it
became irrelevant to this patch. Now, it is a general optimization to
set the same timestamp for all slots while restoring from disk. I have
added a few comments as per Peter's suggestion.

> 5) Why should the slot invalidation be updated during shutdown,
> shouldn't the inactive_since value be intact during shutdown?
> - <literal>NULL</literal> if the slot is currently being used.
> - Note that for slots on the standby that are being synced from a
> + <literal>NULL</literal> if the slot is currently being used. Once the
> + slot is invalidated, this value will remain unchanged until we shutdown
> + the server. Note that for slots on the standby that are being
> synced from a
>

The "inactive_since" data of a slot is not stored on disk, so the
older value cannot be restored after a restart.

--
Thanks,
Nisha

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiro Ikeda 2024-11-19 08:30:56 Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Previous Message Nisha Moond 2024-11-19 07:17:37 Re: Introduce XID age and inactive timeout based replication slot invalidation