From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Atomic operations within spinlocks |
Date: | 2020-06-09 06:08:47 |
Message-ID: | 20200609060847.vd2c67ih4bz6vyww@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2020-06-05 17:19:26 -0700, Andres Freund wrote:
> Hi,
>
> On 2020-06-04 19:33:02 -0700, Andres Freund wrote:
> > But it looks like that code is currently buggy (and looks like it always
> > has been), because we don't look at the nested argument when
> > initializing the semaphore. So we currently allocate too many
> > semaphores, without benefiting from them :(.
>
> I wrote a patch for this, and when I got around to to testing it, I
> found that our tests currently don't pass when using both
> --disable-spinlocks and --disable-atomics. Turns out to not be related
> to the issue above, but the global barrier support added in 13.
>
> That *reads* two 64 bit atomics in a signal handler. Which is normally
> fine, but not at all cool when atomics (or just 64 bit atomics) are
> backed by spinlocks. Because we can "self interrupt" while already
> holding the spinlock.
>
> It looks to me that that's a danger whenever 64bit atomics are backed by
> spinlocks, not just when both --disable-spinlocks and --disable-atomics
> are used. But I suspect that it's really hard to hit the tiny window of
> danger when those options aren't used. While we have buildfarm animals
> testing each of those separately, we don't have one that tests both
> together...
>
> I'm not really sure what to do about that issue. The easisest thing
> would probably be to change the barrier generation to 32bit (which
> doesn't have to use locks for reads in any situation). I tested doing
> that, and it fixes the hangs for me.
>
>
> Randomly noticed while looking at the code:
> uint64 flagbit = UINT64CONST(1) << (uint64) type;
>
> that shouldn't be 64bit, right?
Attached is a series of patches addressing these issues, of varying
quality:
1) This fixes the above mentioned issue in the global barrier code by
using 32bit atomics. That might be fine, or it might not. I just
included it here because otherwise the tests cannot be run fully.
2) Fixes spinlock emulation when more than INT_MAX spinlocks are
initialized in the lifetime of a single backend
3) Add spinlock tests to normal regression tests.
- Currently as part of test_atomic_ops. Probably not worth having a
separate SQL function?
- Currently contains a test for 1) that's run when the spinlock
emulation is used. Probably too slow to actually indclude? Takes 15s
on my computer... OTOH, it's just with --disable-spinlocks...
- Could probably remove the current spinlock tests after this. The
only thing they additionally test is a stuck spinlock. Since
they're not run currently, they don't seem worth much?
4) Fix the potential for deadlocks when using atomics while holding a
spinlock, add tests for that.
Any comments?
Greetings,
Andres Freund
Attachment | Content-Type | Size |
---|---|---|
v1-0001-WORKAROUND-Avoid-spinlock-use-in-signal-handler-b.patch | text/x-diff | 7.2 KB |
v1-0002-spinlock-emulation-Fix-bug-when-more-than-INT_MAX.patch | text/x-diff | 1.0 KB |
v1-0003-Add-basic-spinlock-tests-to-regression-tests.patch | text/x-diff | 3.9 KB |
v1-0004-Fix-deadlock-danger-when-atomic-ops-are-done-unde.patch | text/x-diff | 7.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-06-09 06:09:21 | Re: BUG #16481: Stored Procedure Triggered by Logical Replication is Unable to use Notification Events |
Previous Message | Tom Lane | 2020-06-09 06:03:09 | Re: Add -Wold-style-definition to CFLAGS? |