Re: Introduce XID age and inactive timeout based replication slot invalidation

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2025-02-05 04:59:52
Message-ID: CALDaNm0X_vgAxKPT+c14yqKcgE5-x4XBdXsCAVqD6_aa-QYUvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 4 Feb 2025 at 19:56, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
>
> Here is v69 patch set addressing above and Kuroda-san's comments in [1].

Few minor suggestions:
1) In the slot invalidation reporting below:
+ case RS_INVAL_IDLE_TIMEOUT:
+ Assert(inactive_since > 0);
+
+ /* translator: second %s is a GUC variable name */
+ appendStringInfo(&err_detail, _("The slot's
idle time %s exceeds the configured \"%s\" duration."),
+
timestamptz_to_str(inactive_since),
+
"idle_replication_slot_timeout");
+ /* translator: %s is a GUC variable name */
+ appendStringInfo(&err_hint, _("You might need
to increase \"%s\"."),
+
"idle_replication_slot_timeout");

It is logged like:
2025-02-05 10:04:11.616 IST [330567] DETAIL: The slot's idle time
2025-02-05 10:02:49.131631+05:30 exceeds the configured
"idle_replication_slot_timeout" duration.

Here even though we tell idle time, we are logging the inactive_since
value which kind of gives a wrong meaning.

How about we change it to:
The slot has been inactive since 2025-02-05 10:02:49.131631+05:30,
which exceeds the configured "idle_replication_slot_timeout" duration.

2) Here we have mentioned about invalidation happens only for a)
released slots b) inactive slots replication slots c) slot where
communication between pub and sub is down
+ * XXX: Slot invalidation due to 'idle_timeout' applies only to
+ * released slots, and is based on the
'idle_replication_slot_timeout'
+ * GUC. Active slots currently in use for replication
are excluded to
+ * prevent accidental invalidation. Slots where
communication between
+ * the publisher and subscriber is down are also
excluded, as they are
+ * managed by the 'wal_sender_timeout'.
+ */
+ InvalidateObsoleteReplicationSlots(RS_INVAL_IDLE_TIMEOUT,
+
0,
+
InvalidOid,
+
InvalidTransactionId);
a) Can we include about slots which does not reserve WAL are also not
considered.
b) Could we present this in a bullet-point format like the following:
+ * XXX: Slot invalidation due to 'idle_timeout' applies only to:
+ * 1) released slots, and is based on the
'idle_replication_slot_timeout'
+ * GUC. 2) Active slots currently in use for
replication are excluded to
+ * prevent accidental invalidation. 3) Slots where
communication between
+ * the publisher and subscriber is down are also
excluded, as they are
+ * managed by the 'wal_sender_timeout'.
+ */
c) While I was initially reviewing the patch I also had the similar
thoughts on my mind, if we could mention the one like "Slots where
communication between the publisher and subscriber is down are also
excluded, as they are managed by the 'wal_sender_timeout'" in the
documentation it might be good.

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-02-05 05:10:15 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Amit Kapila 2025-02-05 04:56:22 Re: Separate GUC for replication origins