From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date: | 2024-09-10 01:34:24 |
Message-ID: | CAHut+PsHmfuDQt_gN2udjb9Ln00EvWw-eiRag5prB7JOgn9eEA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, here is the remainder of my v45-0001 review. These comments are
for the test code only.
======
Testcase #1
1.
+# Testcase start
+#
+# Invalidate streaming standby slot and logical failover slot on primary due to
+# inactive timeout. Also, check logical failover slot synced to standby from
+# primary doesn't invalidate on its own, but gets the invalidated
state from the
+# primary.
nit - s/primary/the primary/ (in a couple of places)
nit - s/standby/the standby/
nit - other trivial tweaks.
~~~
2.
+# Create sync slot on primary
+$primary->psql('postgres',
+ q{SELECT pg_create_logical_replication_slot('sync_slot1',
'test_decoding', false, false, true);}
+);
nit - s/primary/the primary/
~~~
3.
+$primary->safe_psql(
+ 'postgres', qq[
+ SELECT pg_create_physical_replication_slot(slot_name :=
'sb_slot1', immediately_reserve := true);
+]);
Should this have a comment?
~~~
4.
+# Wait until standby has replayed enough data
+$primary->wait_for_catchup($standby1);
nit - s/standby/the standby/
~~~
5.
+# Sync primary slot to standby
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
nit - /Sync primary slot to standby/Sync the primary slots to the standby/
~~~
6.
+# Confirm that logical failover slot is created on standby
nit - s/Confirm that logical failover slot is created on
standby/Confirm that the logical failover slot is created on the
standby/
~~~
7.
+is( $standby1->safe_psql(
+ 'postgres',
+ q{SELECT count(*) = 1 FROM pg_replication_slots
+ WHERE slot_name = 'sync_slot1' AND synced AND NOT temporary;}
+ ),
+ "t",
+ 'logical slot sync_slot1 has synced as true on standby');
IMO here you should also be checking that the sync slot state is NOT
invalidated, just as a counterpoint for the test part later that
checks that it IS invalidated.
~~~
8.
+my $inactive_timeout = 1;
+
+# Set timeout so that next checkpoint will invalidate inactive slot
+$primary->safe_psql(
+ 'postgres', qq[
+ ALTER SYSTEM SET replication_slot_inactive_timeout TO
'${inactive_timeout}s';
+]);
+$primary->reload;
8a.
nit - I think that $inactive_timeout assignment belongs below your comment.
~
8b.
nit - s/Set timeout so that next checkpoint will invalidate inactive
slot/Set timeout GUC so that the next checkpoint will invalidate
inactive slots/
~~~
9.
+# Check for logical failover slot to become inactive on primary. Note that
+# nobody has acquired slot yet, so it must get invalidated due to
+# inactive timeout.
nit - /Check for logical failover slot to become inactive on
primary./Wait for logical failover slot to become inactive on the
primary./
nit - /has acquired slot/has acquired the slot/
~~~
10.
+# Sync primary slot to standby. Note that primary slot has already been
+# invalidated due to inactive timeout. Standby must just sync inavalidated
+# state.
nit - minor, add "the". fix typo "inavalidated", etc. suggestion:
Re-sync the primary slots to the standby. Note that the primary slot was already
invalidated (above) due to inactive timeout. The standby must just
sync the invalidated
state.
~~~
11.
+# Make standby slot on primary inactive and check for invalidation
+$standby1->stop;
nit - /standby slot/the standby slot/
nit - /on primary/on the primary/
======
Testcase #2
12.
I'm not sure it is necessary to do all this extra work. IIUC, there
was already almost everything you needed in the previous Testcase #1.
So, I thought you could just combine this extra standby timeout test
in Testcase #1.
Indeed, your Testcase #1 comment still says it is doing this: ("Also,
check logical failover slot synced to standby from primary doesn't
invalidate on its own,...")
e.g.
- NEW: set the GUC timeout on the standby
- sync the sync_slot (already doing in test #1)
- ensure the synced slot is NOT invalid (already suggested above for test #1)
- NEW: then do a standby sleep > timeout duration
- NEW: then do a standby CHECKPOINT...
- NEW: then ensure the sync slot invalidation did NOT happen
- then proceed with the rest of test #1...
======
Testcase #3
13.
nit - remove a few blank lines to group associated statements together.
~~~
14.
+$publisher->safe_psql(
+ 'postgres', qq[
+ ALTER SYSTEM SET replication_slot_inactive_timeout TO '
${inactive_timeout}s';
+]);
+$publisher->reload;
nit - this deserves a comment, the same as in Testcase #1
======
sub wait_for_slot_invalidation
15.
+# Check for slot to first become inactive and then get invalidated
+sub check_for_slot_invalidation
nit - IMO the previous name was better (e.g. "wait_for.." instead of
"check_for...") because that describes exactly what the subroutine is
doing.
suggestion:
# Wait for the slot to first become inactive and then get invalidated
sub wait_for_slot_invalidation
~~~
16.
+{
+ my ($node, $slot, $offset, $inactive_timeout) = @_;
+ my $name = $node->name;
The variable $name seems too vague. How about $node_name?
~~~
17.
+ # Wait for invalidation reason to be set
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+ WHERE slot_name = '$slot' AND
+ invalidation_reason = 'inactive_timeout';
+ ])
+ or die
+ "Timed out while waiting for invalidation reason of slot $slot to
be set on node $name";
17a.
nit - /# Wait for invalidation reason to be set/# Check that the
invalidation reason is 'inactive_timeout'/
IIUC, the 'trigger_slot_invalidation' function has already invalidated
the slot at this point, so we are not really "Waiting..."; we are
"Checking..." that the reason was correctly set.
~
17b.
I think this code fragment maybe would be better put inside the
'trigger_slot_invalidation' function. (I've done this in the nitpicks
attachment)
~~~
18.
+ # Check that invalidated slot cannot be acquired
+ my ($result, $stdout, $stderr);
+
+ ($result, $stdout, $stderr) = $node->psql(
+ 'postgres', qq[
+ SELECT pg_replication_slot_advance('$slot', '0/1');
+ ]);
18a.
s/Check that invalidated slot/Check that an invalidated slot/
~
18b.
nit - Remove some blank lines, because the comment applies to all below it.
======
sub trigger_slot_invalidation
19.
+# Trigger slot invalidation and confirm it in server log
+sub trigger_slot_invalidation
nit - s/confirm it in server log/confirm it in the server log/
~
20.
+{
+ my ($node, $slot, $offset, $inactive_timeout) = @_;
+ my $name = $node->name;
+ my $invalidated = 0;
(same as the other subroutine)
nit - The variable $name seems too vague. How about $node_name?
======
Please refer to the attached nitpicks top-up patch which implements
most of the above nits.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachment | Content-Type | Size |
---|---|---|
PS_NITPICKS_20240910_SLOT_V450001_TESTS.txt | text/plain | 7.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Sami Imseih | 2024-09-10 02:27:00 | Re: query_id, pg_stat_activity, extended query protocol |
Previous Message | Sami Imseih | 2024-09-10 01:20:01 | Re: query_id, pg_stat_activity, extended query protocol |