Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-09-09 07:40:50
Message-ID: CAHut+PtJxxaVhjDCFPqE_V63VaAN_bL+PHRMak=dkghoGrZc0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, here are some review comments for v45-0001 (excluding the test code)

======
doc/src/sgml/config.sgml

1.
+ Note that the inactive timeout invalidation mechanism is not
+ applicable for slots on the standby server that are being synced
+ from primary server (i.e., standby slots having

nit - /from primary server/from the primary server/

======
src/backend/replication/slot.c

2. ReplicationSlotAcquire

+ errmsg("can no longer get changes from replication slot \"%s\"",
+ NameStr(s->data.name)),
+ errdetail("This slot has been invalidated because it was inactive
for longer than the amount of time specified by \"%s\".",
+ "replication_slot_inactive_timeout.")));

nit - "replication_slot_inactive_timeout." - should be no period
inside that GUC name literal

~~~

3. ReportSlotInvalidation

I didn't understand why there was a hint for:
"You might need to increase \"%s\".", "max_slot_wal_keep_size"

But you don't have an equivalent hint for timeout invalidation:
"You might need to increase \"%s\".", "replication_slot_inactive_timeout"

Why aren't these similar cases consistent?

~~~

4. RestoreSlotFromDisk

+ /* Use the same inactive_since time for all the slots. */
+ if (now == 0)
+ now = GetCurrentTimestamp();
+

Is the deferred assignment really necessary? Why not just
unconditionally assign the 'now' just before the for-loop? Or even at
the declaration? e.g. The 'replication_slot_inactive_timeout' is
measured in seconds so I don't think 'inactive_since' being wrong by a
millisecond here will make any difference.

======
src/include/replication/slot.h

5. ReplicationSlotSetInactiveSince

+/*
+ * Set slot's inactive_since property unless it was previously invalidated due
+ * to inactive timeout.
+ */
+static inline void
+ReplicationSlotSetInactiveSince(ReplicationSlot *s, TimestampTz *now,
+ bool acquire_lock)
+{
+ if (acquire_lock)
+ SpinLockAcquire(&s->mutex);
+
+ if (s->data.invalidated != RS_INVAL_INACTIVE_TIMEOUT)
+ s->inactive_since = *now;
+
+ if (acquire_lock)
+ SpinLockRelease(&s->mutex);
+}

Is the logic correct? What if the slot was already invalid due to some
reason other than RS_INVAL_INACTIVE_TIMEOUT? Is an Assert needed?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
PS_NITPICKS_20240909_TIMEOUT_V450001.txt text/plain 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-09-09 07:42:41 Re: Invalid Assert while validating REPLICA IDENTITY?
Previous Message Michael Paquier 2024-09-09 06:56:14 Re: Partitioned tables and [un]loggedness