Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Date: 2023-01-30 08:43:01
Message-ID: CA+hUKGK3iuXde4N1qHY0z+ZBd8+c0AOMk6g-e0cSjSfBiUEkNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 30, 2023 at 6:36 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2023-01-30 15:22:34 +1300, Thomas Munro wrote:
> > On Mon, Jan 30, 2023 at 6:26 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > > out-of-order hazard
> >
> > I've been trying to understand how that could happen, but my CPU-fu is
> > weak. Let me try to write an argument for why it can't happen, so
> > that later I can look back at how stupid and naive I was. We have A
> > B, and if the CPU sees no dependency and decides to execute B A
> > (pipelined), shouldn't an interrupt either wait for the whole
> > schemozzle to commit first (if not in a hurry), or nuke it, handle the
> > IPI and restart, or something?
>
> In a core local view, yes, I think so. But I don't think that's how it can
> work on multi-core, and even more so, multi-socket machines. Imagine how it'd
> influence latency if every interrupt on any CPU would prevent all out-of-order
> execution on any CPU.

Good. Yeah, I was talking only about a single thread/core.

> > After an hour of reviewing randoma
> > slides from classes on out-of-order execution and reorder buffers and
> > the like, I think the term for making sure that interrupts run with
> > the illusion of in-order execution maintained is called "precise
> > interrupts", and it is expected in all modern architectures, after the
> > early OoO pioneers lost their minds trying to program without it. I
> > guess generally you want that because it would otherwise run your
> > interrupt handler in a completely uncertain environment, and
> > specifically in this case it would reach our signal handler which
> > reads A's output (waiting) and writes to B's input (is_set), so B IPI
> > A surely shouldn't be allowed?
>
> Userspace signals aren't delivered synchronously during hardware interrupts
> afaik - and I don't think they even possibly could be (after all the process
> possibly isn't scheduled).

Yeah, they're not synchronous and the target might not even be
running. BUT if a suitable thread is running then AFAICT an IPI is
delivered to that sucker to get it running the handler ASAP, at least
on the three OSes I looked at. (See breadcrumbs below).

> I think what you're talking about with precise interrupts above is purely
> about the single-core view, and mostly about hardware interrupts for faults
> etc. The CPU will unwind state from speculatively executed code etc on
> interrupt, sure - but I think that's separate from guaranteeing that you can't
> have stale cache contents *due to work by another CPU*.

Yeah. I get the cache problem, a separate issue that does indeed look
pretty dodgy. I guess I wrote my email out-of-order: at the end I
speculated that cache coherency probably can't explain this failure at
least in THAT bit of the source, because of that funky extra
self-SetLatch(). I just got spooked by the mention of out-of-order
execution and I wanted to chase it down and straighten out my
understanding.

> I'm not even sure that userspace signals are generally delivered via an
> immediate hardware interrupt, or whether they're processed at the next
> scheduler tick. After all, we know that multiple signals are coalesced, which
> certainly isn't compatible with synchronous execution. But it could be that
> that just happens when the target of a signal is not currently scheduled.

FreeBSD: By default, they are when possible, eg if the process is
currently running a suitable thread. You can set sysctl
kern.smp.forward_signal_enabled=0 to turn that off, and then it works
more like the way you imagined (checking for pending signals at
various arbitrary times, not sure). See tdsigwakeup() ->
forward_signal() -> ipi_cpu().

Linux: Well it certainly smells approximately similar. See
signal_wake_up_state() -> kick_process() -> smp_send_reschedule() ->
smp_cross_call() -> __ipi_send_mask(). The comment for kick_process()
explains that it's using the scheduler IPI to get signals handled
ASAP.

Darwin: ... -> cpu_signal() -> something that talks about IPIs

Coalescing is happening not only at the pending signal level (an
invention of the OS), and then for the inter-processor wakeups there
is also interrupt coalescing. It's latches all the way down.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-01-30 08:54:31 Re: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Bharath Rupireddy 2023-01-30 08:16:46 Re: Syncrep and improving latency due to WAL throttling