From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | global barrier & atomics in signal handlers (Re: Atomic operations within spinlocks) |
Date: | 2020-06-09 19:37:23 |
Message-ID: | 20200609193723.eu5ilsjxwdpyxhgz@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-06-08 23:08:47 -0700, Andres Freund wrote:
> On 2020-06-05 17:19:26 -0700, Andres Freund wrote:
> > I wrote a patch for this, and when I got around to to testing it, I
> > found that our tests currently don't pass when using both
> > --disable-spinlocks and --disable-atomics. Turns out to not be related
> > to the issue above, but the global barrier support added in 13.
> >
> > That *reads* two 64 bit atomics in a signal handler. Which is normally
> > fine, but not at all cool when atomics (or just 64 bit atomics) are
> > backed by spinlocks. Because we can "self interrupt" while already
> > holding the spinlock.
> >
> > It looks to me that that's a danger whenever 64bit atomics are backed by
> > spinlocks, not just when both --disable-spinlocks and --disable-atomics
> > are used. But I suspect that it's really hard to hit the tiny window of
> > danger when those options aren't used. While we have buildfarm animals
> > testing each of those separately, we don't have one that tests both
> > together...
> >
> > I'm not really sure what to do about that issue. The easisest thing
> > would probably be to change the barrier generation to 32bit (which
> > doesn't have to use locks for reads in any situation). I tested doing
> > that, and it fixes the hangs for me.
> >
> >
> > Randomly noticed while looking at the code:
> > uint64 flagbit = UINT64CONST(1) << (uint64) type;
> >
> > that shouldn't be 64bit, right?
>
> Attached is a series of patches addressing these issues, of varying
> quality:
>
> 1) This fixes the above mentioned issue in the global barrier code by
> using 32bit atomics. That might be fine, or it might not. I just
> included it here because otherwise the tests cannot be run fully.
Hm. Looking at this again, perhaps the better fix would be to simply not
look at the concrete values of the barrier inside the signal handler?
E.g. we could have a new PROCSIG_GLOBAL_BARRIER, which just triggers
ProcSignalBarrierPending to be set. And then have
ProcessProcSignalBarrier do the check that's currently in
CheckProcSignalBarrier()?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2020-06-09 19:51:58 | Re: Speedup usages of pg_*toa() functions |
Previous Message | Andres Freund | 2020-06-09 19:23:45 | Re: Why is pq_begintypsend so slow? |