From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, exclusion(at)gmail(dot)com |
Subject: | Re: Fix race condition in InvalidatePossiblyObsoleteSlot() |
Date: | 2024-03-05 10:17:03 |
Message-ID: | ZebxH6YJ8mfB5rJ6@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Tue, Mar 05, 2024 at 09:42:20AM +0900, Michael Paquier wrote:
> On Mon, Feb 26, 2024 at 02:01:45PM +0000, Bertrand Drouvot wrote:
> > Though [1] mentioned up-thread is not pushed yet, I'm Sharing the POC patch now
> > (see the attached).
>
> I have looked at what you have here.
Thanks!
> First, in a build where 818fefd8fd is included, this makes the test
> script a lot slower. Most of the logic is quick, but we're spending
> 10s or so checking that catalog_xmin has advanced. Could it be
> possible to make that faster?
Yeah, v2 attached changes this. It moves the injection point after the process
has been killed so that another process can decode at wish (without the need
to wait for a walsender timeout) to reach LogicalConfirmReceivedLocation().
> A second issue is the failure mode when 818fefd8fd is reverted. The
> test is getting stuck when we are waiting on the standby to catch up,
> until a timeout decides to kick in to fail the test, and all the
> previous tests pass. Could it be possible to make that more
> responsive? I assume that in the failure mode we would get an
> incorrect conflict_reason for injection_inactiveslot, succeeding in
> checking the failure.
I try to simulate a revert of 818fefd8fd (replacing "!terminated" by "1 == 1"
before the initial_* assignements). The issue is that then the new ASSERT is
triggered leading to the standby shutdown. So, I'm not sure how to improve this
case.
> + my $terminated = 0;
> + for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
> + {
> + if ($node_standby->log_contains(
> + 'terminating process .* to release replication slot \"injection_activeslot\"', $logstart))
> + {
> + $terminated = 1;
> + last;
> + }
> + usleep(100_000);
> + }
> + ok($terminated, 'terminating process holding the active slot is logged with injection point');
>
> The LOG exists when we are sure that the startup process is waiting
> in the injection point, so this loop could be replaced with something
> like:
> + $node_standby->wait_for_event('startup', 'TerminateProcessHoldingSlot');
> + ok( $node_standby->log_contains('terminating process .* .. ', 'termin .. ';)
>
Yeah, now that wait_for_event() is there, let's use it: done in v2.
> Nit: the name of the injection point should be
> terminate-process-holding-slot rather than
> TerminateProcessHoldingSlot, to be consistent with the other ones.
Done in v2.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Add-regression-test-for-818fefd8fd.patch | text/x-diff | 6.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | John Naylor | 2024-03-05 10:30:16 | Re: Change GUC hashtable to use simplehash? |
Previous Message | zwj | 2024-03-05 10:04:33 | 回复: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445) |