Re: Introduce XID age and inactive timeout based replication slot invalidation

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-11-20 07:59:29
Message-ID: CALDaNm0g86wD2=bQdFOy0smsP0MZWyz0CUqXej=Qi-hCEeqkag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 19 Nov 2024 at 12:43, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
>
> Attached is the v49 patch set:
> - Fixed the bug reported in [1].
> - Addressed comments in [2] and [3].
>
> I've split the patch into two, implementing the suggested idea in
> comment #5 of [2] separately in 001:
>
> Patch-001: Adds additional error reports (for all invalidation types)
> in ReplicationSlotAcquire() for invalid slots when error_if_invalid =
> true.
> Patch-002: The original patch with comments addressed.

This Assert can fail:
+ /*
+ * Check if the slot needs to
be invalidated due to
+ *
replication_slot_inactive_timeout GUC.
+ */
+ if (now &&
+
TimestampDifferenceExceeds(s->inactive_since, now,
+
replication_slot_inactive_timeout_sec *
1000))
+ {
+ invalidation_cause = cause;
+ inactive_since =
s->inactive_since;
+
+ /*
+ * Invalidation due to
inactive timeout implies that
+ * no one is using the slot.
+ */
+ Assert(s->active_pid == 0);

With the following scenario:
Set replication_slot_inactive_timeout to 10 seconds
-- Create a slot
postgres=# select pg_create_logical_replication_slot ('test',
'pgoutput', true, true);
pg_create_logical_replication_slot
------------------------------------
(test,0/1748068)
(1 row)

-- Wait for 10 seconds and execute checkpoint
postgres=# checkpoint;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly

The assert fails:
#5 0x00005b074f0c922f in ExceptionalCondition
(conditionName=0x5b074f2f0b4c "s->active_pid == 0",
fileName=0x5b074f2f0010 "slot.c", lineNumber=1762) at assert.c:66
#6 0x00005b074ee26ead in InvalidatePossiblyObsoleteSlot
(cause=RS_INVAL_INACTIVE_TIMEOUT, s=0x740925361780, oldestLSN=0,
dboid=0, snapshotConflictHorizon=0, invalidated=0x7fffaee87e63) at
slot.c:1762
#7 0x00005b074ee273b2 in InvalidateObsoleteReplicationSlots
(cause=RS_INVAL_INACTIVE_TIMEOUT, oldestSegno=0, dboid=0,
snapshotConflictHorizon=0) at slot.c:1952
#8 0x00005b074ee27678 in CheckPointReplicationSlots
(is_shutdown=false) at slot.c:2061
#9 0x00005b074e9dfda7 in CheckPointGuts (checkPointRedo=24412528,
flags=108) at xlog.c:7513
#10 0x00005b074e9df4ad in CreateCheckPoint (flags=108) at xlog.c:7179
#11 0x00005b074edc6bfc in CheckpointerMain (startup_data=0x0,
startup_data_len=0) at checkpointer.c:463

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-11-20 08:46:59 RE: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
Previous Message Corey Huinker 2024-11-20 07:58:43 Re: PGSERVICEFILE as part of a normal connection string