Fix 035_standby_logical_decoding.pl race conditions

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Fix 035_standby_logical_decoding.pl race conditions
Date: 2025-02-10 14:42:37
Message-ID: Z6oQXc8LmiTLfwLA@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

Please find attached a patch to $SUBJECT.

In rare circumstances (and on slow machines) it is possible that a xl_running_xacts
is emitted and that the catalog_xmin of a logical slot on the standby advances
past the conflict point. In that case, no conflict is reported and the test
fails. It has been observed several times and the last discussion can be found
in [1].

To avoid the race condition to occur this commit adds an injection point to prevent
the catalog_xmin of a logical slot to advance past the conflict point.

While working on this patch, some adjustements have been needed for injection
points (they are proposed in 0001):

- Adds the ability to wakeup() and detach() while ensuring that no process can
wait in between. It's done thanks to a new injection_points_wakeup_detach()
function that is holding the spinlock during the whole duration.

- If the walsender is waiting on the injection point and that the logical slot
is conflicting, then the walsender process is killed and so it is not able to
"empty" it's injection slot. So the next injection_wait() should reuse this slot
(instead of using an empty one). injection_wait() has been modified that way
in 0001.

With 0001 in place, then we can make use of an injection point in
LogicalConfirmReceivedLocation() and update 035_standby_logical_decoding.pl to
prevent the catalog_xmin of a logical slot to advance past the conflict point.

Remarks:

R1. The issue still remains in v16 though (as injection points are available since
v17).
R2. 0001 should probably bump the injection point module to 1.1, but shouldn't
have been the case in d28cd3e7b21c?

[1]: https://www.postgresql.org/message-id/flat/386386.1737736935%40sss.pgh.pa.us

Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v1-0001-Add-injection_points_wakeup_detach-and-modify-inj.patch text/x-diff 5.5 KB
v1-0002-Fix-race-conditions-in-035_standby_logical_decodi.patch text/x-diff 3.5 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2025-02-10 14:46:15 Re: Doc: Move standalone backup section, mention -X argument
Previous Message Zhou, Zhiguo 2025-02-10 14:12:33 Re: [RFC] Lock-free XLog Reservation from WAL