From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date: | 2025-02-06 02:32:29 |
Message-ID: | CABdArM6oKV8248tSgbewzoFnTNkDonaCroO9_J_+78AKA4W7Sw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Feb 5, 2025 at 2:42 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Feb 5, 2025 at 10:30 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Tue, 4 Feb 2025 at 19:56, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> > >
> > > Here is v69 patch set addressing above and Kuroda-san's comments in [1].
> >
> > Few minor suggestions:
> > 1) In the slot invalidation reporting below:
> > + case RS_INVAL_IDLE_TIMEOUT:
> > + Assert(inactive_since > 0);
> > +
> > + /* translator: second %s is a GUC variable name */
> > + appendStringInfo(&err_detail, _("The slot's
> > idle time %s exceeds the configured \"%s\" duration."),
> > +
> > timestamptz_to_str(inactive_since),
> > +
> > "idle_replication_slot_timeout");
> > + /* translator: %s is a GUC variable name */
> > + appendStringInfo(&err_hint, _("You might need
> > to increase \"%s\"."),
> > +
> > "idle_replication_slot_timeout");
> >
> > It is logged like:
> > 2025-02-05 10:04:11.616 IST [330567] DETAIL: The slot's idle time
> > 2025-02-05 10:02:49.131631+05:30 exceeds the configured
> > "idle_replication_slot_timeout" duration.
> >
> > Here even though we tell idle time, we are logging the inactive_since
> > value which kind of gives a wrong meaning.
> >
> > How about we change it to:
> > The slot has been inactive since 2025-02-05 10:02:49.131631+05:30,
> > which exceeds the configured "idle_replication_slot_timeout" duration.
> >
>
> Would it address your concern if we write the actual idle duration
> (now - inactive_since) instead of directly using inactive_since in the
> above message?
>
Simply using the raw timestamp difference (now - inactive_since) would
look odd. We should convert it into a user-friendly format. Since
idle_replication_slot_timeout is in minutes, we can express the
difference in minutes and seconds in the log.
For example:
DETAIL: The slot's idle time of 1 minute and 7 seconds exceeds the
configured "idle_replication_slot_timeout" duration.
This has been implemented in v70.
Thoughts?
> A few other comments:
> 1.
> + * 4. The slot is not being synced from the primary while the server
> + * is in recovery
> + *
> + * Note that the idle timeout invalidation mechanism is not
> + * applicable for slots on the standby server that are being synced
> + * from the primary server (i.e., standby slots having 'synced' field 'true').
> + * Synced slots are always considered to be inactive because they don't
> + * perform logical decoding to produce changes.
>
> The 4th point in the above comment and the rest of the comment is
> mostly saying the same thing.
>
Done. I've merged the additional info and 4th point.
> 2.
> + * Flush all replication slots to disk. Also, invalidate obsolete slots during
> + * non-shutdown checkpoint.
> *
> * It is convenient to flush dirty replication slots at the time of checkpoint.
> * Additionally, in case of a shutdown checkpoint, we also identify the slots
> @@ -1924,6 +2007,45 @@ CheckPointReplicationSlots(bool is_shutdown)
>
> Can we try and see how the patch looks if we try to invalidate the
> slot due to idle time at the same time when we are trying to
> invalidate due to WAL?
>
I'll consider the suggested change in the next version.
~~~~
Here are the v70 patches - addressed above and other comments in [1],
[2] and [3].
[1] https://www.postgresql.org/message-id/CAHut%2BPvW3pr3P3hXwBskXrDmJYKedmqRaPZcL4iLRQ51%3DXxOBw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CALDaNm0X_vgAxKPT%2Bc14yqKcgE5-x4XBdXsCAVqD6_aa-QYUvg%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAHut%2BPtCpOnifF9wnhJ%3Djo7KLmtT%3DMikuYnM9GGPTVA80rq7OA%40mail.gmail.com
--
Thanks,
Nisha
Attachment | Content-Type | Size |
---|---|---|
v70-0001-Introduce-inactive_timeout-based-replication-slo.patch | application/octet-stream | 22.7 KB |
v70-0002-Add-TAP-test-for-slot-invalidation-based-on-inac.patch | application/octet-stream | 6.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2025-02-06 02:35:47 | Re: generic plans and "initial" pruning |
Previous Message | Nisha Moond | 2025-02-06 02:32:13 | Re: Introduce XID age and inactive timeout based replication slot invalidation |