Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-09-02 08:06:44
Message-ID: CAHut+PuC7nwuT3+36zsLxtBAcUHrSaysrHb0_dxTXqMnFBKZBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi. Thanks for addressing my previous review comments.

Here are some review comments for v44-0001.

======
Commit message.

1.
Because such synced slots are typically considered not
active (for them to be later considered as inactive) as they don't
perform logical decoding to produce the changes.

~

This sentence is bad grammar. The docs have the same wording, so
please see my doc review comment #4 suggestion below.

======
doc/src/sgml/config.sgml

2.
+ <para>
+ Invalidates replication slots that are inactive for longer than
+ specified amount of time. If this value is specified without units,
+ it is taken as seconds. A value of zero (which is default) disables
+ the timeout mechanism. This parameter can only be set in
+ the <filename>postgresql.conf</filename> file or on the server
+ command line.
+ </para>
+

nit - This is OK as-is, but OTOH why not make the wording consistent
with the previous GUC description? (e.g. see my v43 [1] #2 review
comment)

~~~

3.
+ <para>
+ This invalidation check happens either when the slot is acquired
+ for use or during checkpoint. The time since the slot has become
+ inactive is known from its
+ <structfield>inactive_since</structfield> value using which the
+ timeout is measured.
+ </para>
+

I felt this is slightly misleading because slot acquiring has nothing
to do with setting the slot invalidation anymore. Furthermore, the 2nd
sentence is bad grammar.

nit - IMO something simple like the following rewording can address
both of those points:

Slot invalidation due to inactivity timeout occurs during checkpoint.
The duration of slot inactivity is calculated using the slot's
<structfield>inactive_since</structfield> field value.

~

4.
+ Because such synced slots are typically considered not active
+ (for them to be later considered as inactive) as they don't perform
+ logical decoding to produce the changes.

That sentence has bad grammar.

nit – suggest a much simpler replacement:
Synced slots are always considered to be inactive because they don't
perform logical decoding to produce changes.

======
src/backend/replication/slot.c

5.
+#define IsInactiveTimeoutSlotInvalidationApplicable(s) \
+ (replication_slot_inactive_timeout > 0 && \
+ s->inactive_since > 0 && \
+ !RecoveryInProgress() && \
+ !s->data.synced)
+

5a.
I felt this would be better implemented as an inline function. Then it
can be commented on properly to explain the parts of the condition.
e.g. the large comment currently in InvalidatePossiblyObsoleteSlot()
would be more appropriate in this function.

~

5b.
The name is very long. Can't it be something shorter/simpler like:
'IsSlotATimeoutCandidate()'

~~~

6. ReplicationSlotAcquire

-ReplicationSlotAcquire(const char *name, bool nowait)
+ReplicationSlotAcquire(const char *name, bool nowait,
+ bool check_for_invalidation)

nit - Previously this new parameter really did mean to "check" for
[and set the slot] invalidation. But now I suggest renaming it to
'error_if_invalid' to properly reflect the new usage. And also in the
slot.h.

~

7.
+ /*
+ * Error out if the slot has been invalidated previously. Because there's
+ * no use in acquiring the invalidated slot.
+ */

nit - The comment is contrary to the code. If there was no reason to
skip this error, then you would not have the new parameter allowing
you to skip this error. I suggest just repeating the same comment as
in the function header.

~~~

8. ReportSlotInvalidation

nit - Added some blank lines for consistency.

~~~

9. InvalidatePossiblyObsoleteSlot

+ /*
+ * Quick exit if inactive timeout invalidation mechanism
+ * is disabled or slot is currently being used or the
+ * server is in recovery mode or the slot on standby is
+ * currently being synced from the primary.
+ *
+ * Note that the inactive timeout invalidation mechanism
+ * is not applicable for slots on the standby server that
+ * are being synced from primary server. Because such
+ * synced slots are typically considered not active (for
+ * them to be later considered as inactive) as they don't
+ * perform logical decoding to produce the changes.
+ */
+ if (!IsInactiveTimeoutSlotInvalidationApplicable(s))
+ break;

9a.
Consistency is good (commit message, docs and code comments for this),
but the added sentence has bad grammar. Please see the docs review
comment #4 above for some alternate phrasing.

~

9b.
Now that this logic is moved into a macro (I suggested it should be an
inline function) IMO this comment does not belong here anymore because
it is commenting code that you cannot see. Instead, this comment (or
something like it) should be as comments within the new function.

======
src/include/replication/slot.h

10.
+extern void ReplicationSlotAcquire(const char *name, bool nowait,
+ bool check_for_invalidation);

Change the new param name as described in the earlier review comment.

======
src/test/recovery/t/050_invalidate_slots.pl

~~~

Please refer to the attached file which implements some of the nits
mentioned above.

======
[1] v43 review -
https://www.postgresql.org/message-id/CAHut%2BPuFzCHPCiZbpoQX59kgZbebuWT0gR0O7rOe4t_sdYu%3DOA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
PS_NITPICKS_v440001.txt text/plain 4.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2024-09-02 08:17:52 Re: pgsql: Add more SQL/JSON constructor functions
Previous Message Daniel Gustafsson 2024-09-02 08:03:14 Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?