Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-03-27 05:48:46
Message-ID: ZgOzPiP7NLgBuGL7@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, Mar 27, 2024 at 10:08:33AM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 26, 2024 at 11:22 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > - if (!(RecoveryInProgress() && slot->data.synced))
> > + if (!(InRecovery && slot->data.synced))
> > slot->inactive_since = GetCurrentTimestamp();
> > else
> > slot->inactive_since = 0;
> >
> > Not related to this change but more the way RestoreSlotFromDisk() behaves here:
> >
> > For a sync slot on standby it will be set to zero and then later will be
> > synchronized with the one coming from the primary. I think that's fine to have
> > it to zero for this window of time.
>
> Right.
>
> > Now, if the standby is down and one sets sync_replication_slots to off,
> > then inactive_since will be set to zero on the standby at startup and not
> > synchronized (unless one triggers a manual sync). I also think that's fine but
> > it might be worth to document this behavior (that after a standby startup
> > inactive_since is zero until the next sync...).
>
> Isn't this behaviour applicable for other slot parameters that the
> slot syncs from the remote slot on the primary?

No they are persisted on disk. If not, we'd not know where to resume the decoding
from on the standby in case primary is down and/or sync is off.

> I've added the following note in the comments when we update
> inactive_since in RestoreSlotFromDisk.
>
> * Note that for synced slots after the standby starts up (i.e. after
> * the slots are loaded from the disk), the inactive_since will remain
> * zero until the next slot sync cycle.
> */
> if (!(InRecovery && slot->data.synced))
> slot->inactive_since = GetCurrentTimestamp();
> else
> slot->inactive_since = 0;

I think we should add some words in the doc too and also about what the meaning
of inactive_since on the standby is (as suggested by Shveta in [1]).

[1]: https://www.postgresql.org/message-id/CAJpy0uDkTW%2Bt1k3oPkaipFBzZePfFNB5DmiA%3D%3DpxRGcAdpF%3DPg%40mail.gmail.com

> > 7 ===
> >
> > +# Capture and validate inactive_since of a given slot.
> > +sub capture_and_validate_slot_inactive_since
> > +{
> > + my ($node, $slot_name, $slot_creation_time) = @_;
> > + my $name = $node->name;
> >
> > We know have capture_and_validate_slot_inactive_since at 2 places:
> > 040_standby_failover_slots_sync.pl and 019_replslot_limit.pl.
> >
> > Worth to create a sub in Cluster.pm?
>
> I'd second that thought for now. We might have to debate first if it's
> useful for all the nodes even without replication, and if yes, the
> naming stuff and all that. Historically, we've had such duplicated
> functions until recently, for instance advance_wal and log_contains.
> We
> moved them over to a common perl library Cluster.pm very recently. I'm
> sure we can come back later to move it to Cluster.pm.

I thought that would be the right time not to introduce duplicated code.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-03-27 06:09:04 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Bharath Rupireddy 2024-03-27 05:35:04 Re: Introduce XID age and inactive timeout based replication slot invalidation