Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-08-26 11:05:02
Message-ID: CAA4eK1KVbicrQQbMLEQiHEi9=VJpxrErrK7CiMamwaMPfff3WA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 26, 2024 at 11:44 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>

Few comments on 0001:
1.
@@ -651,6 +651,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
remote_dbid)
" name slot \"%s\" already exists on the standby",
remote_slot->name));

+ /*
+ * Skip the sync if the local slot is already invalidated. We do this
+ * beforehand to avoid slot acquire and release.
+ */
+ if (slot->data.invalidated != RS_INVAL_NONE)
+ return false;
+
/*
* The slot has been synchronized before.

I was wondering why you have added this new check as part of this
patch. If you see the following comments in the related code, you will
know why we haven't done this previously.

/*
* The slot has been synchronized before.
*
* It is important to acquire the slot here before checking
* invalidation. If we don't acquire the slot first, there could be a
* race condition that the local slot could be invalidated just after
* checking the 'invalidated' flag here and we could end up
* overwriting 'invalidated' flag to remote_slot's value. See
* InvalidatePossiblyObsoleteSlot() where it invalidates slot directly
* if the slot is not acquired by other processes.
*
* XXX: If it ever turns out that slot acquire/release is costly for
* cases when none of the slot properties is changed then we can do a
* pre-check to ensure that at least one of the slot properties is
* changed before acquiring the slot.
*/
ReplicationSlotAcquire(remote_slot->name, true);

We need some modifications in these comments if you want to add a
pre-check here.

2.
@@ -1907,6 +2033,31 @@ CheckPointReplicationSlots(bool is_shutdown)
SaveSlotToPath(s, path, LOG);
}
LWLockRelease(ReplicationSlotAllocationLock);
+
+ elog(DEBUG1, "performing replication slot invalidation checks");
+
+ /*
+ * Note that we will make another pass over replication slots for
+ * invalidations to keep the code simple. The assumption here is that the
+ * traversal over replication slots isn't that costly even with hundreds
+ * of replication slots. If it ever turns out that this assumption is
+ * wrong, we might have to put the invalidation check logic in the above
+ * loop, for that we might have to do the following:
+ *
+ * - Acqure ControlLock lock once before the loop.
+ *
+ * - Call InvalidatePossiblyObsoleteSlot for each slot.
+ *
+ * - Handle the cases in which ControlLock gets released just like
+ * InvalidateObsoleteReplicationSlots does.
+ *
+ * - Avoid saving slot info to disk two times for each invalidated slot.
+ *
+ * XXX: Should we move inactive_timeout inavalidation check closer to
+ * wal_removed in CreateCheckPoint and CreateRestartPoint?
+ */
+ InvalidateObsoleteReplicationSlots(RS_INVAL_INACTIVE_TIMEOUT,
+ 0, InvalidOid, InvalidTransactionId);

Why do we want to call this for shutdown case (when is_shutdown is
true)? I understand trying to invalidate slots during regular
checkpoint but not sure if we need it at the time of shutdown. The
other point is can we try to check the performance impact with 100s of
slots as mentioned in the code comments?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2024-08-26 11:09:58 Re: elog/ereport VS misleading backtrace_function function address
Previous Message Ashutosh Bapat 2024-08-26 11:04:49 Re: Trim the heap free memory