pgsql: Fix race leading to incorrect conflict cause in InvalidatePossib

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Fix race leading to incorrect conflict cause in InvalidatePossib
Date: 2024-02-20 04:44:21
Message-ID: E1rcHzR-00772d-NB@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Fix race leading to incorrect conflict cause in InvalidatePossiblyObsoleteSlot()

The invalidation of an active slot is done in two steps:
- Termination of the backend holding it, if any.
- Report that the slot is obsolete, with a conflict cause depending on
the slot's data.

This can be racy because between these two steps the slot mutex would be
released while doing system calls, which means that the effective_xmin
and effective_catalog_xmin could advance during that time, detecting a
conflict cause different than the one originally wanted before the
process owning a slot is terminated.

Holding the mutex longer is not an option, so this commit changes the
code to record the LSNs stored in the slot during the termination of the
process owning the slot.

Bonus thanks to Alexander Lakhin for the various tests and the analysis.

Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/ZaTjW2Xh+TQUCOH0@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 16

Branch
------
REL_16_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/59cea09f03a56a40bce70a7461226c4d45740d02

Modified Files
--------------
src/backend/replication/slot.c | 39 +++++++++++++++++++++++++++++++++------
1 file changed, 33 insertions(+), 6 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message David Rowley 2024-02-20 05:34:57 pgsql: Minor corrections for partition pruning
Previous Message Michael Paquier 2024-02-20 02:59:29 pgsql: doc: Use system-username instead of system-user