From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, exclusion(at)gmail(dot)com |
Subject: | Re: Fix race condition in InvalidatePossiblyObsoleteSlot() |
Date: | 2024-02-19 08:15:16 |
Message-ID: | CALj2ACUMPTvuk0_GuPiFbUb=PMTQq0WMi5Kvj8Mz0JZ_ico1Lw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Feb 19, 2024 at 11:44 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> > Yeah, comments added in v3.
>
> The contents look rather OK, I may do some word-smithing for both.
Here are some comments on v3:
1.
+ XLogRecPtr initial_effective_xmin = InvalidXLogRecPtr;
+ XLogRecPtr initial_catalog_effective_xmin = InvalidXLogRecPtr;
+ XLogRecPtr initial_restart_lsn = InvalidXLogRecPtr;
Prefix 'initial_' makes the variable names a bit longer, I think we
can just use effective_xmin, catalog_effective_xmin and restart_lsn,
the code updating then only when if (!terminated) tells one that they
aren't updated every time.
2.
+ /*
+ * We'll release the slot's mutex soon, so it's possible that
+ * those values change since the process holding the slot has been
+ * terminated (if any), so record them here to ensure we would
+ * report the slot as obsolete correctly.
+ */
This needs a bit more info as to why and how effective_xmin,
catalog_effective_xmin and restart_lsn can move ahead after signaling
a backend and before the signalled backend reports back.
3.
+ /*
+ * Assert that the conflict cause recorded previously before we
+ * terminate the process did not change now for any reason.
+ */
+ Assert(!(conflict_prev != RS_INVAL_NONE && terminated &&
+ conflict_prev != conflict));
It took a while for me to understand the above condition, can we
simplify it like below using De Morgan's laws for better readability?
Assert((conflict_prev == RS_INVAL_NONE) || !terminated ||
(conflict_prev == conflict));
> > Are we going to fix this on master and 16 stable first and then later on add a
> > test on master with the injection points?
>
> Still, the other patch is likely going to take a couple of weeks
> before getting into the tree, so I have no objection to fix the bug
> first and backpatch, then introduce a test. Things have proved that
> failures could show up in the buildfarm in this area, so more time
> running this code across two branches is not a bad concept, either.
While I couldn't agree more on getting this fix in, it's worth pulling
in the required injection points patch here and writing the test to
reproduce this race issue.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Bertrand Drouvot | 2024-02-19 08:25:14 | Re: System username in pg_stat_activity |
Previous Message | Quan Zongliang | 2024-02-19 08:05:03 | Re: Improvement discussion of custom and generic plans |