Re: Introduce XID age and inactive timeout based replication slot invalidation

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-11-19 10:23:35
Message-ID: CALDaNm2UUTfJczjR-rEQwKgmx=iFnuMnR1cXv7ccB+O9P15mYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 19 Nov 2024 at 12:43, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
>
> Attached is the v49 patch set:
> - Fixed the bug reported in [1].
> - Addressed comments in [2] and [3].
>
> I've split the patch into two, implementing the suggested idea in
> comment #5 of [2] separately in 001:
>
> Patch-001: Adds additional error reports (for all invalidation types)
> in ReplicationSlotAcquire() for invalid slots when error_if_invalid =
> true.
> Patch-002: The original patch with comments addressed.

Few comments:
1) I felt this check in wait_for_slot_invalidation is not required as
there is a call to trigger_slot_invalidation which sleeps for
inactive_timeout seconds and ensures checkpoint is triggered, also the
test passes without this:
+ # Wait for slot to become inactive
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+ WHERE slot_name = '$slot' AND active = 'f' AND
+ inactive_since IS NOT NULL;
+ ])
+ or die
+ "Timed out while waiting for slot $slot to become inactive
on node $node_name";

2) Instead of calling this in a loop, won't it be enough to call
checkpoint only once explicitly:
+ for (my $i = 0; $i < 10 *
$PostgreSQL::Test::Utils::timeout_default; $i++)
+ {
+ $node->safe_psql('postgres', "CHECKPOINT");
+ if ($node->log_contains(
+ "invalidating obsolete replication
slot \"$slot\"", $offset))
+ {
+ $invalidated = 1;
+ last;
+ }
+ usleep(100_000);
+ }
+ ok($invalidated,
+ "check that slot $slot invalidation has been logged on
node $node_name"
+ );

3) Since pg_sync_replication_slots is a sync call, we can directly use
"is( $standby1->safe_psql('postgres', SELECT COUNT(slot_name) = 1 FROM
pg_replication_slots..." instead of poll_query_until:
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+$standby1->poll_query_until(
+ 'postgres', qq[
+ SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+ WHERE slot_name = 'sync_slot1' AND
+ invalidation_reason = 'inactive_timeout';
+])
+ or die
+ "Timed out while waiting for sync_slot1 invalidation to be synced
on standby";

4) Since this variable is being referred to at many places, how about
changing it to inactive_timeout_1s so that it is easier while
reviewing across many places:
# Set timeout GUC on the standby to verify that the next checkpoint will not
# invalidate synced slots.
my $inactive_timeout = 1;

5) Since we have already tested invalidation of logical replication
slot 'sync_slot1' above, this test might not be required:
+# =============================================================================
+# Testcase start
+# Invalidate logical subscriber slot due to inactive timeout.
+
+my $publisher = $primary;
+
+# Prepare for test
+$publisher->safe_psql(
+ 'postgres', qq[
+ ALTER SYSTEM SET replication_slot_inactive_timeout TO '0';
+]);
+$publisher->reload;

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2024-11-19 10:31:21 Re: Doc: typo in config.sgml
Previous Message Masahiro Ikeda 2024-11-19 10:20:32 Re: Allow default \watch interval in psql to be configured