Re: Weird test mixup

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Weird test mixup
Date: 2024-05-06 01:03:37
Message-ID: ZjgsaYs44ici4QMW@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 02, 2024 at 12:35:55PM -0700, Noah Misch wrote:
> I should have given a simpler example:
>
> s1: local-attach to POINT
> s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
> s1: exit
> s2: wake up and run POINT as though it had been non-local

Hmm. Even if you were to emulate that in a controlled manner, you
would need a second injection point that does a wait in s2, which is
something that would happen before injection_callback() and before
scanning the local entry. This relies on the fact that we're holding
CPU in s2 between the backend shmem hash table lookup and the callback
being called.

>> Fun. One thing I would ask is why it makes sense to be able to detach
>> a local point from a different session than the one who defined it as
>> local. Shouldn't the operation of s3 be restricted rather than
>> authorized as a safety measure, instead?
>
> (That's orthogonal to the race condition.) When s1 would wait at the
> injection point multiple times in one SQL statement, I like issuing the detach
> from s3 so s1 waits at just the first encounter with the injection point.
> This mimics setting a gdb breakpoint and deleting that breakpoint before
> "continue". The alternative, waking s1 repeatedly until it finishes the SQL
> statement, is less convenient. (I also patched _detach() to wake the waiter,
> and I plan to propose that.)

Okay.

>> Indeed. That's a brain fade. This one could be fixed by collecting
>> the point names when cleaning up the conditions and detach after
>> releasing the spinlock. This opens a race condition between the
>> moment when the spinlock is released and the detach, where another
>> backend could come in and detach a point before the shmem_exit
>> callback has the time to do its cleanup, even if detach() is
>> restricted for local points. So we could do the callback cleanup in
>
> That race condition seems fine. The test can be expected to control the
> timing of backend exits vs. detach calls. Unlike the InjectionPointRun()
> race, it wouldn't affect backends unrelated to the test.

Sure. The fact that there are two spinlocks in the backend code and
the module opens concurrency issues. Making that more robust if there
is a case for it is OK by me, but I'd rather avoid making the
backend-side more complicated than need be.

>> three steps in the shmem exit callback:
>> - Collect the names of the points to detach, while holding the
>> spinlock.
>> - Do the Detach.
>> - Take again the spinlock, clean up the conditions.
>>
>> Please see the attached.
>
> The injection_points_cleanup() parts look good. Thanks.

Thanks for the check.

>> @@ -403,6 +430,10 @@ injection_points_detach(PG_FUNCTION_ARGS)
>> {
>> char *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
>>
>> + if (!injection_point_allowed(name))
>> + elog(ERROR, "cannot detach injection point \"%s\" not allowed to run",
>> + name);
>> +
>
> As above, I disagree with the injection_points_detach() part.

Okay, noted. Fine by me to expand this stuff as you feel, the code
has been written to be extended depending on what people want to
support. There should be tests in the tree that rely on any
new behavior, though.

I've applied the patch to fix the spinlock logic in the exit callback
for now.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-05-06 02:51:25 Re: 039_end_of_wal: error in "xl_tot_len zero" test
Previous Message Michail Nikolaev 2024-05-05 23:37:20 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements