Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-04-01 03:17:59
Message-ID: CALj2ACWOz0tEY2vi3OneittGj9+yrQ9A6+6YYb1Cj7L_QrdMVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 29, 2024 at 9:39 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> Commit message states: "why we can't just update inactive_since for
> synced slots on the standby with the value received from remote slot
> on the primary. This is consistent with any other slot parameter i.e.
> all of them are synced from the primary."
>
> The inactive_since is not consistent with other slot parameters which
> we copy. We don't perform anything related to those other parameters
> like say two_phase phase which can change that property. However, we
> do acquire the slot, advance the slot (as per recent discussion [1]),
> and release it. Since these operations can impact inactive_since, it
> seems to me that inactive_since is not the same as other parameters.
> It can have a different value than the primary. Why would anyone want
> to know the value of inactive_since from primary after the standby is
> promoted?

After thinking about it for a while now, it feels to me that the
synced slots (slots on the standby that are being synced from the
primary) can have their own inactive_sicne value. Fundamentally,
inactive_sicne is set to 0 when slot is acquired and set to current
time when slot is released, no matter who acquires and releases it -
be it walsenders for replication, or backends for slot advance, or
backends for slot sync using pg_sync_replication_slots, or backends
for other slot functions, or background sync worker. Remember the
earlier patch was updating inactive_since just for walsenders, but
then the suggestion was to update it unconditionally -
https://www.postgresql.org/message-id/CAJpy0uD64X%3D2ENmbHaRiWTKeQawr-rbGoy_GdhQQLVXzUSKTMg%40mail.gmail.com.
Whoever syncs the slot, *acutally* acquires the slot i.e. makes it
theirs, syncs it from the primary, and releases it. IMO, no
differentiation is to be made for synced slots.

There was a suggestion on using inactive_since of the synced slot on
the standby to know the inactivity of the slot on the primary. If one
wants to do that, they better look at/monitor the primary slot
info/logs/pg_replication_slot/whatever. I really don't see a point in
having two different meanings for a single property of a replication
slot - inactive_since for a regular slot tells since when this slot
has become inactive, and for a synced slot since when the
corresponding remote slot has become inactive. I think this will
confuse users for sure.

Also, if inactive_since is being changed on the primary so frequently,
and none of the other parameters are changing, if we copy
inactive_since to the synced slots, then standby will just be doing
*sync* work (mark the slots dirty and save to disk) for updating
inactive_since. I think this is unnecessary behaviour for sure.

Coming to a future patch for inactive timeout based slot invalidation,
we can either allow invalidation without any differentiation for
synced slots or restrict invalidation to avoid more sync work. For
instance, if inactive timeout is kept low on the standby, the sync
worker will be doing more work as it drops and recreates a slot
repeatedly if it keeps getting invalidated. Another thing is that the
standby takes independent invalidation decisions for synced slots.
AFAICS, invalidation due to wal_removal is the only sole reason (out
of all available invalidation reasons) for a synced slot to get
invalidated independently of the primary. Check
https://www.postgresql.org/message-id/CAA4eK1JXBwTaDRD_%3D8t6UB1fhRNjC1C%2BgH4YdDxj_9U6djLnXw%40mail.gmail.com
for the suggestion on we better not differentiaing invalidation
decisions for synced slots.

The assumption of letting synced slots have their own inactive_since
not only simplifies the code, but also looks less-confusing and more
meaningful to the user. The only code that we put in on top of the
committed code is to use InRecovery in place of
RecoveryInProgress() in RestoreSlotFromDisk() to fix the issue raised
by Shveta upthread.

> Now, the other concern is that calling GetCurrentTimestamp()
> could be costly when the values for the slot are not going to be
> updated but if that happens we can optimize such that before acquiring
> the slot we can have some minimal pre-checks to ensure whether we need
> to update the slot or not.
>
> [1] - https://www.postgresql.org/message-id/OS0PR01MB571615D35F486080616CA841943A2%40OS0PR01MB5716.jpnprd01.prod.outlook.com

A quick test with a function to measure the cost of
GetCurrentTimestamp [1] on my Ubuntu dev system (an AWS EC2 c5.4xlarge
instance), gives me [2]. It took 0.388 ms, 2.269 ms, 21.144 ms,
209.333 ms, 2091.174 ms, 20908.942 ms for 10K, 100K, 1million,
10million, 100million, 1billion times respectively. Costs might be
different on various systems with different OS, but it gives us a
rough idea.

If we are too much concerned about the cost of GetCurrentTimestamp(),
a possible approach is just don't set inactive_since for slots being
synced on the standby. Just let the first acquisition and release
after the promotion do that job. We can always call this out in the
docs saying "replication slots on the streaming standbys which are
being synced from the primary are not inactive in practice, so the
inactive_since is always NULL for them unless the standby is
promoted".

[1]
Datum
pg_get_current_timestamp(PG_FUNCTION_ARGS)
{
int loops = PG_GETARG_INT32(0);
TimestampTz ctime;

for (int i = 0; i < loops; i++)
ctime = GetCurrentTimestamp();

PG_RETURN_TIMESTAMPTZ(ctime);
}

[2]
postgres=# \timing
Timing is on.
postgres=# SELECT pg_get_current_timestamp(1000000000);
pg_get_current_timestamp
-------------------------------
2024-03-30 19:07:57.374797+00
(1 row)

Time: 20908.942 ms (00:20.909)
postgres=# SELECT pg_get_current_timestamp(100000000);
pg_get_current_timestamp
-------------------------------
2024-03-30 19:08:21.038064+00
(1 row)

Time: 2091.174 ms (00:02.091)
postgres=# SELECT pg_get_current_timestamp(10000000);
pg_get_current_timestamp
-------------------------------
2024-03-30 19:08:24.329949+00
(1 row)

Time: 209.333 ms
postgres=# SELECT pg_get_current_timestamp(1000000);
pg_get_current_timestamp
-------------------------------
2024-03-30 19:08:26.978016+00
(1 row)

Time: 21.144 ms
postgres=# SELECT pg_get_current_timestamp(100000);
pg_get_current_timestamp
-------------------------------
2024-03-30 19:08:29.142248+00
(1 row)

Time: 2.269 ms
postgres=# SELECT pg_get_current_timestamp(10000);
pg_get_current_timestamp
------------------------------
2024-03-30 19:08:31.34621+00
(1 row)

Time: 0.388 ms

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-04-01 03:34:43 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Masahiko Sawada 2024-04-01 02:53:28 Re: [PoC] Improve dead tuple storage for lazy vacuum