Quick Links

Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Date:	2023-09-01 08:00:00
Message-ID:	60bb34ad-a696-c43d-3f7c-1696796e86ce@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello Thomas,

31.08.2023 14:15, Thomas Munro wrote:

> We have a signal that is pending and not blocked, so I don't
> immediately know why poll() hasn't returned control.

When I worked at the Postgres Pro company, we observed a similar lockup
under rather specific conditions (we used Elbrus CPU and the specific Elbrus
compiler (lcc) based on edg).
I managed to reproduce that lockup and Anton Voloshin investigated it.
The issue was caused by the compiler optimization in WaitEventSetWait():
    waiting = true;
...
    while (returned_events == 0)
    {
...
        if (set->latch && set->latch->is_set)
        {
...
            break;
        }

In that case, compiler decided that it may place the read
"set->latch->is_set" before the write "waiting = true".
(Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the
issue for us.)
I can't provide more details for now, but maybe you could look at the binary
code generated on the target platform to confirm or reject my guess.

Best regards,
Alexander

In response to

Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) at 2023-08-31 11:15:20 from Thomas Munro

Responses

Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) at 2023-09-01 13:00:29 from Tomas Vondra
Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) at 2023-09-01 20:21:42 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Krishnakumar R	2023-09-01 08:01:31	Move bki file pre-processing from initdb to bootstrap
Previous Message	Peter Smith	2023-09-01 07:49:21	Re: [PoC] pg_upgrade: allow to upgrade publisher node