Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-09-16 10:01:11
Message-ID: CALj2ACXjkM4nEE5dXeV=NA2NmbE3AmonKEgudYBMqU3qO9-9fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for reviewing.

On Mon, Sep 9, 2024 at 1:11 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> 1.
> + Note that the inactive timeout invalidation mechanism is not
> + applicable for slots on the standby server that are being synced
> + from primary server (i.e., standby slots having
>
> nit - /from primary server/from the primary server/

+1

> 2. ReplicationSlotAcquire
>
> + errmsg("can no longer get changes from replication slot \"%s\"",
> + NameStr(s->data.name)),
> + errdetail("This slot has been invalidated because it was inactive
> for longer than the amount of time specified by \"%s\".",
> + "replication_slot_inactive_timeout.")));
>
> nit - "replication_slot_inactive_timeout." - should be no period
> inside that GUC name literal

Typo. Fixed.

> 3. ReportSlotInvalidation
>
> I didn't understand why there was a hint for:
> "You might need to increase \"%s\".", "max_slot_wal_keep_size"
>
> Why aren't these similar cases consistent?

It looks misleading and not very useful. What happens if the removed
WAL (that's needed for the slot) is put back into pg_wal somehow (by
manually copying from archive or by some tool/script)? Can the slot
invalidated due to wal_removed start sending WAL to its clients?

> But you don't have an equivalent hint for timeout invalidation:
> "You might need to increase \"%s\".", "replication_slot_inactive_timeout"

I removed this per review comments upthread.

> 4. RestoreSlotFromDisk
>
> + /* Use the same inactive_since time for all the slots. */
> + if (now == 0)
> + now = GetCurrentTimestamp();
> +
>
> Is the deferred assignment really necessary? Why not just
> unconditionally assign the 'now' just before the for-loop? Or even at
> the declaration? e.g. The 'replication_slot_inactive_timeout' is
> measured in seconds so I don't think 'inactive_since' being wrong by a
> millisecond here will make any difference.

Moved it before the for-loop.

> 5. ReplicationSlotSetInactiveSince
>
> +/*
> + * Set slot's inactive_since property unless it was previously invalidated due
> + * to inactive timeout.
> + */
> +static inline void
> +ReplicationSlotSetInactiveSince(ReplicationSlot *s, TimestampTz *now,
> + bool acquire_lock)
> +{
> + if (acquire_lock)
> + SpinLockAcquire(&s->mutex);
> +
> + if (s->data.invalidated != RS_INVAL_INACTIVE_TIMEOUT)
> + s->inactive_since = *now;
> +
> + if (acquire_lock)
> + SpinLockRelease(&s->mutex);
> +}
>
> Is the logic correct? What if the slot was already invalid due to some
> reason other than RS_INVAL_INACTIVE_TIMEOUT? Is an Assert needed?

Hm. Since invalidated slots can't be acquired and made active, not
modifying inactive_since irrespective of invalidation reason looks
good to me.

Please find the attached v46 patch having changes for the above review
comments and your test review comments and Shveta's review comments.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v46-0001-Introduce-inactive_timeout-based-replication-slo.patch application/octet-stream 33.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-09-16 10:32:51 Re: Add contrib/pg_logicalsnapinspect
Previous Message Bharath Rupireddy 2024-09-16 09:47:47 Re: Introduce XID age and inactive timeout based replication slot invalidation