Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-08-29 06:01:09
Message-ID: CALj2ACXe8+xSNdMXTMaSRWUwX7v61Ad4iddUwnn=djSwx3GLLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for looking into this.

On Mon, Aug 26, 2024 at 4:35 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> Few comments on 0001:
> 1.
> @@ -651,6 +651,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
>
> + /*
> + * Skip the sync if the local slot is already invalidated. We do this
> + * beforehand to avoid slot acquire and release.
> + */
>
> I was wondering why you have added this new check as part of this
> patch. If you see the following comments in the related code, you will
> know why we haven't done this previously.

Removed. Can deal with optimization separately.

> 2.
> + */
> + InvalidateObsoleteReplicationSlots(RS_INVAL_INACTIVE_TIMEOUT,
> + 0, InvalidOid, InvalidTransactionId);
>
> Why do we want to call this for shutdown case (when is_shutdown is
> true)? I understand trying to invalidate slots during regular
> checkpoint but not sure if we need it at the time of shutdown.

Changed it to invalidate only for non-shutdown checkpoints.
inactive_timeout invalidation isn't critical for shutdown unlike
wal_removed which can help shutdown by freeing up some disk space.

> The
> other point is can we try to check the performance impact with 100s of
> slots as mentioned in the code comments?

I first checked how much does the wal_removed invalidation check add to the
checkpoint (see 2nd and 3rd column). I then checked how much
inactive_timeout invalidation check adds to the checkpoint (see 4th
column), it is not more than wal_remove invalidation check. I then checked
how much the wal_removed invalidation check adds for replication slots that
have already been invalidated due to inactive_timeout (see 5th column),
looks like not much.

| # of slots | HEAD (no invalidation) ms | HEAD (wal_removed) ms | PATCHED
(inactive_timeout) ms | PATCHED (inactive_timeout+wal_removed) ms |
|------------|----------------------------|-----------------------|-------------------------------|------------------------------------------|
| 100 | 18.591 | 370.586 | 359.299
| 373.882 |
| 1000 | 15.722 | 4834.901 |
5081.751 | 5072.128 |
| 10000 | 19.261 | 59801.062 |
61270.406 | 60270.099 |

Having said that, I'm okay to implement the optimization specified.
Thoughts?

+ /*
+ * NB: We will make another pass over replication slots for
+ * invalidation checks to keep the code simple. Testing shows that
+ * there is no noticeable overhead (when compared with wal_removed
+ * invalidation) even if we were to do inactive_timeout invalidation
+ * of thousands of replication slots here. If it is ever proven that
+ * this assumption is wrong, we will have to perform the invalidation
+ * checks in the above for loop with the following changes:
+ *
+ * - Acquire ControlLock lock once before the loop.
+ *
+ * - Call InvalidatePossiblyObsoleteSlot for each slot.
+ *
+ * - Handle the cases in which ControlLock gets released just like
+ * InvalidateObsoleteReplicationSlots does.
+ *
+ * - Avoid saving slot info to disk two times for each invalidated
+ * slot.

Please see the attached v43 patches addressing the above review comments.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v43-0002-Add-XID-age-based-replication-slot-invalidation.patch application/octet-stream 33.0 KB
v43-0001-Add-inactive_timeout-based-replication-slot-inva.patch application/octet-stream 33.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-08-29 06:16:35 Re: Pgoutput not capturing the generated columns
Previous Message Tom Lane 2024-08-29 05:55:27 Re: 039_end_of_wal: error in "xl_tot_len zero" test