Re: Weird test mixup

From: Noah Misch <noah(at)leadboat(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Weird test mixup
Date: 2024-05-09 03:15:53
Message-ID: 20240509031553.47@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 09, 2024 at 09:37:54AM +0900, Michael Paquier wrote:
> On Tue, May 07, 2024 at 11:53:10AM -0700, Noah Misch wrote:
> > The problem I'm trying to tackle in this thread is to make
> > src/test/modules/gin installcheck-safe. $SUBJECT's commit 5105c90 started
> > that work, having seen the intarray test suite break when run concurrently
> > with the injection_points test suite. That combination still does break at
> > the exit-time race condition. To reproduce, apply this attachment to add
> > sleeps, and run:
> >
> > make -C src/test/modules/gin installcheck USE_MODULE_DB=1 & sleep 2; make -C contrib/intarray installcheck USE_MODULE_DB=1
>
> Thanks for that. I am not really sure how to protect that without a
> small cost in flexibility for the cases of detach vs run paths. This
> comes down to the fact that a custom routine could be run while it
> could be detached concurrently, removing any stuff a callback could
> depend on in the module.
>
> It was mentioned upthread to add to InjectionPointCacheEntry a fixed
> area of memory that modules could use to store some "status" data, but
> it would not close the run/detach race because a backend could still
> hold a pointer to a callback, with concurrent backends playing with
> the contents of InjectionPointCacheEntry (concurrent detaches and
> attaches that would cause the same entries to be reused).

> One way to close entirely the window would be to hold
> InjectionPointLock longer in InjectionPointRun() until the callback
> finishes or until it triggers an ERROR. This would mean that the case
> you've mentioned in [1] would change, by blocking the detach() of s3
> until the callback of s2 finishes.

> [1] https://www.postgresql.org/message-id/20240501231214.40@rfd.leadboat.com

Yes, that would be a loss for test readability. Also, I wouldn't be surprised
if some test will want to attach POINT-B while POINT-A is in injection_wait().
Various options avoiding those limitations:

1. The data area formerly called a "status" area is immutable after attach.
The core code copies it into a stack buffer to pass in a const callback
argument.

2. InjectionPointAttach() returns an attachment serial number, and the
callback receives that as an argument. injection_points.c would maintain
an InjectionPointCondition for every still-attached serial number,
including global attachments. Finding no condition matching the serial
number argument means detach already happened and callback should do
nothing.

3. Move the PID check into core code.

4. Separate the concept of "make ineligible to fire" from "detach", with
stronger locking for detach. v1 pointed in this direction, though not
using that terminology.

5. Injection point has multiple callbacks. At least one runs with the lock
held and can mutate the "status" data. At least one runs without the lock.

(1) is, I think, simple and sufficient. How about that?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-05-09 03:16:30 Re: Support tid range scan in parallel?
Previous Message Richard Guo 2024-05-09 02:43:31 Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.