Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-09-16 09:47:47
Message-ID: CALj2ACXKKohg7+35Tv2wrynat+TdzVU-dqp4qQAMjgTh03yRGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for reviewing.

On Mon, Sep 9, 2024 at 10:54 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> 2)
> src/sgml/config.sgml:
>
> + disables the inactive timeout invalidation mechanism
>
> + Slot invalidation due to inactivity timeout occurs during checkpoint.
>
> Either have 'inactive' at both the places or 'inactivity'.

Used "inactive timeout".

> 3)
> slot.c:
> +static bool InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause
> cause,
> + ReplicationSlot *s,
> + XLogRecPtr oldestLSN,
> + Oid dboid,
> + TransactionId snapshotConflictHorizon,
> + bool *invalidated);
> +static inline bool SlotInactiveTimeoutCheckAllowed(ReplicationSlot *s);
>
> I think, we do not need above 2 declarations. The code compile fine
> without these as the usage is later than the definition.

Hm, it's a usual practice that I follow irrespective of the placement
of function declarations. Since it was brought up, I removed the
declarations.

> 4)
> + /*
> + * An error is raised if error_if_invalid is true and the slot has been
> + * invalidated previously.
> + */
> + if (error_if_invalid && s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT)
>
> The comment is generic while the 'if condition' is specific to one
> invalidation cause. Even though I feel it can be made generic test for
> all invalidation causes but that is not under scope of this thread and
> needs more testing/analysis.

Right.

> For the time being, we can make comment
> specific to the concerned invalidation cause. The header of function
> will also need the same change.

Adjusted the comment, but left the variable name error_if_invalid as
is. Didn't want to make it long, one can look at the code to
understand what it is used for.

> 5)
> SlotInactiveTimeoutCheckAllowed():
>
> + * Check if inactive timeout invalidation mechanism is disabled or slot is
> + * currently being used or server is in recovery mode or slot on standby is
> + * currently being synced from the primary.
> + *
>
> These comments say exact opposite of what we are checking in code.
> Since the function name has 'Allowed' in it, we should be putting
> comments which say what allows it instead of what disallows it.

Modified.

> 1)
> src/sgml/config.sgml:
>
> + Synced slots are always considered to be inactive because they
> don't perform logical decoding to produce changes.
>
> It is better we avoid such a statement, as internally we use logical
> decoding to advance restart-lsn, see
> 'LogicalSlotAdvanceAndCheckSnapState' called form slotsync.c.
> <Also see related comment 6 below>
>
> 6)
>
> + * Synced slots are always considered to be inactive because they don't
> + * perform logical decoding to produce changes.
> + */
> +static inline bool
> +SlotInactiveTimeoutCheckAllowed(ReplicationSlot *s)
>
> Perhaps we should avoid mentioning logical decoding here. When slots
> are synced, they are performing decoding and their inactive_since is
> changing continuously. A better way to make this statement will be:
>
> We want to ensure that the slots being synchronized are not
> invalidated, as they need to be preserved for future use when the
> standby server is promoted to the primary. This is necessary for
> resuming logical replication from the new primary server.
> <Rephrase if needed>

They are performing logical decoding, but not producing the changes
for the clients to consume. So, IMO, the accompanying "to produce
changes" next to the "logical decoding" is good here.

> 7)
>
> InvalidatePossiblyObsoleteSlot()
>
> we are calling SlotInactiveTimeoutCheckAllowed() twice in this
> function. We shall optimize.
>
> At the first usage place, shall we simply get timestamp when cause is
> RS_INVAL_INACTIVE_TIMEOUT without checking
> SlotInactiveTimeoutCheckAllowed() as IMO it does not seem a
> performance critical section. Or if we retain check at first place,
> then at the second place we can avoid calling it again based on
> whether 'now' is NULL or not.

Getting a current timestamp can get costlier on platforms that use
various clock sources, so assigning 'now' unconditionally isn't the
way IMO. Using the inline function in two places improves the
readability. Can optimize it if there's any performance impact of
calling the inline function in two places.

Will post the new patch version soon.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-09-16 10:01:11 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Tomas Vondra 2024-09-16 09:27:37 Re: A starter task