From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |
Date: | 2023-09-02 23:06:20 |
Message-ID: | CA+hUKGLL=v=f+Fv=cx=qieCyXbdC7DLgyV=+VdKSLJhOPu5nhA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I agree that the code lacks barriers. I haven't been able to figure
out how any reordering could cause this hang, though, because in these
old branches procsignal_sigusr1_handler is used for latch wakeups, and
it also calls SetLatch(MyLatch) itself, right at the end. That is,
SetLatch() gets called twice, first in the waker process and then
again in the awoken process, so it should be impossible for the latter
not to see MyLatch->is_set == true after procsignal_sigusr1_handler
completes.
That made me think the handler didn't run, which is consistent with
procstat -i showing it as pending ('P'). Which made me start to
suspect a kernel bug, unless we can explain what we did to block it...
But... perhaps I am confused about that and did something wrong when
looking into it. It's hard to investigate when you aren't allowed to
take core files or connect a debugger (both will reliably trigger
EINTR).
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2023-09-03 00:03:44 | Re: Row pattern recognition |
Previous Message | Alexander Lakhin | 2023-09-02 21:00:00 | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |