From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | lockup in parallel hash join on dikkop (freebsd 14.0-current) |
Date: | 2023-01-26 20:36:06 |
Message-ID: | b2bc5c16-899e-ca99-26ed-e623b4259ec7@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I received an alert dikkop (my rpi4 buildfarm animal running freebsd 14)
did not report any results for a couple days, and it seems it got into
an infinite loop in REL_11_STABLE when building hash table in a parallel
hashjoin, or something like that.
It seems to be progressing now, probably because I attached gdb to the
workers to get backtraces, which does signals etc.
Anyway, in 'ps ax' I saw this:
94545 - Ss 0:03.39 postgres: buildfarm regression [local] SELECT
94627 - Is 0:00.03 postgres: parallel worker for PID 94545
94628 - Is 0:00.02 postgres: parallel worker for PID 94545
and the backend was stuck waiting on this query:
select final > 1 as multibatch
from hash_join_batches(
$$
select count(*) from join_foo
left join (select b1.id, b1.t from join_bar b1 join join_bar
b2 using (id)) ss
on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
$$);
This started on 2023-01-20 23:23:18.125, and the next log (after I did
the gdb stuff), is from 2023-01-26 20:05:16.751. Quite a bit of time.
It seems all three processes are doing WaitEventSetWait, either through
a ConditionVariable, or WaitLatch. But I don't have any good idea of
what might have broken - and as it got "unstuck" I can't investigate
more. But I see there's nodeHash and parallelism, and I recall there's a
lot of gotchas due to how the backends cooperate when building the hash
table, etc. Thomas, any idea what might be wrong?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment | Content-Type | Size |
---|---|---|
94628.bt.txt | text/plain | 10.7 KB |
94627.bt.txt | text/plain | 9.6 KB |
94545.bt.txt | text/plain | 22.2 KB |
query.log | text/x-log | 1.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-01-26 20:43:25 | Re: wrong Append/MergeAppend elision? |
Previous Message | Peter Geoghegan | 2023-01-26 20:32:01 | Re: New strategies for freezing, advancing relfrozenxid early |