From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |
Date: | 2023-01-29 17:53:36 |
Message-ID: | 20230129175336.dvplrhkoba3pjxpk@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-01-29 18:39:05 +0100, Tomas Vondra wrote:
> Will do, but I'll wait for another lockup to see how frequent it
> actually is. I'm now at ~90 runs total, and it didn't happen again yet.
> So hitting it after 15 runs might have been a bit of a luck.
Was there a difference in how much load there was on the machine between
"reproduced in 15 runs" and "not reproed in 90"? If indeed lack of barriers
is related to the issue, an increase in context switches could substantially
change the behaviour (in both directions). More intra-process context
switches can amount to "probabilistic barriers" because that'll be a
barrier. At the same time it can make it more likely that the relatively
narrow window in WaitEventSetWait() is hit, or lead to larger delays
processing signals.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-01-29 18:05:07 | Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl |
Previous Message | Andres Freund | 2023-01-29 17:41:10 | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |