From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |
Date: | 2023-09-01 08:00:00 |
Message-ID: | 60bb34ad-a696-c43d-3f7c-1696796e86ce@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Thomas,
31.08.2023 14:15, Thomas Munro wrote:
> We have a signal that is pending and not blocked, so I don't
> immediately know why poll() hasn't returned control.
When I worked at the Postgres Pro company, we observed a similar lockup
under rather specific conditions (we used Elbrus CPU and the specific Elbrus
compiler (lcc) based on edg).
I managed to reproduce that lockup and Anton Voloshin investigated it.
The issue was caused by the compiler optimization in WaitEventSetWait():
waiting = true;
...
while (returned_events == 0)
{
...
if (set->latch && set->latch->is_set)
{
...
break;
}
In that case, compiler decided that it may place the read
"set->latch->is_set" before the write "waiting = true".
(Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the
issue for us.)
I can't provide more details for now, but maybe you could look at the binary
code generated on the target platform to confirm or reject my guess.
Best regards,
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | Krishnakumar R | 2023-09-01 08:01:31 | Move bki file pre-processing from initdb to bootstrap |
Previous Message | Peter Smith | 2023-09-01 07:49:21 | Re: [PoC] pg_upgrade: allow to upgrade publisher node |