Re: Weird test mixup

From: Noah Misch <noah(at)leadboat(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Weird test mixup
Date: 2024-05-07 18:53:10
Message-ID: 20240507185310.d3@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 07, 2024 at 10:17:49AM +0900, Michael Paquier wrote:
> On Mon, May 06, 2024 at 02:23:24PM -0700, Noah Misch wrote:
> > Here's how I've patched it locally. It does avoid changing the backend-side,
> > which has some attraction. Shall I just push this?
>
> It looks like you did not rebase on top of HEAD

Yes, the base was 713cfaf (Sunday).

> A side effect is that this causes the conditions to pile
> up on a running server when running installcheck, and assuming that
> many test suites are run on a server left running this could cause
> spurious failures when failing to find a new slot.

Yes, we'd be raising INJ_MAX_CONDITION more often under this approach.

> Always resetting
> condition->name when detaching a point is a simpler flow and saner
> IMO.
>
> Overall, this switches from one detach behavior to a different one,

Can you say more about that? The only behavior change known to me is that a
given injection point workload uses more of INJ_MAX_CONDITION. If there's
another behavior change, it was likely unintended.

> which may or may not be intuitive depending on what one is looking
> for. FWIW, I see InjectionPointCondition as something that should be
> around as long as its injection point exists, with the condition
> entirely gone once the point is detached because it should not exist
> anymore on the server running, with no information left in shmem.
>
> Through your patch, you make conditions have a different meaning, with
> a mix of "local" definition, but it is kind of permanent as it keeps a
> trace of the point's name in shmem. I find the behavior of the patch
> less intuitive. Perhaps it would be interesting to see first the bug
> and/or problem you are trying to tackle with this different behavior
> as I feel like we could do something even with the module as-is. As
> far as I understand, the implementation of the module on HEAD allows
> one to emulate a breakpoint with a wait/wake, which can avoid the
> window mentioned in step 2. Even if a wait point is detached
> concurrently, it can be awaken with its traces in shmem removed.

The problem I'm trying to tackle in this thread is to make
src/test/modules/gin installcheck-safe. $SUBJECT's commit 5105c90 started
that work, having seen the intarray test suite break when run concurrently
with the injection_points test suite. That combination still does break at
the exit-time race condition. To reproduce, apply this attachment to add
sleeps, and run:

make -C src/test/modules/gin installcheck USE_MODULE_DB=1 & sleep 2; make -C contrib/intarray installcheck USE_MODULE_DB=1

Separately, I see injection_points_attach() populates InjectionPointCondition
after InjectionPointAttach(). Shouldn't InjectionPointAttach() come last, to
avoid the same sort of race? I've not tried to reproduce that one.

Attachment Content-Type Size
repro-inj-exit-race-v1.patch text/plain 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2024-05-07 18:54:16 Re: backend stuck in DataFileExtend
Previous Message Nathan Bossart 2024-05-07 18:40:51 Re: pg_sequence_last_value() for unlogged sequences on standbys