From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |
Date: | 2023-09-02 21:00:00 |
Message-ID: | 2132c88f-7e32-6dba-1057-2ecc5ce66509@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Robert,
01.09.2023 23:21, Robert Haas wrote:
> On Fri, Sep 1, 2023 at 6:13 AM Alexander Lakhin<exclusion(at)gmail(dot)com> wrote:
>> (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the
>> issue for us.)
> Maybe it'd be worth trying something stronger, like
> pg_memory_barrier(). A compiler barrier doesn't prevent the CPU from
> reordering loads and stores as it goes, and ARM64 has weak memory
> ordering.
Indeed, thank you for the tip!
So maybe here we deal with not compiler's, but with CPU's optimization.
The wider code fragment is:
805c48: 52800028 mov w8, #1 // true
805c4c: 52800319 mov w25, #24
805c50: 5280073a mov w26, #57
805c54: fd446128 ldr d8, [x9, #2240]
805c58: 90000d7b adrp x27, 0x9b1000 <ModifyWaitEvent+0xb0>
805c5c: fd415949 ldr d9, [x10, #688]
805c60: f9071d68 str x8, [x11, #3640] // waiting = true (x8 = w8)
805c64: f90003f3 str x19, [sp]
805c68: 14000010 b 0x805ca8 <WaitEventSetWait+0x108>
805ca8: f9400a88 ldr x8, [x20, #16] // if (set->latch && set->latch->is_set)
805cac: b4000068 cbz x8, 0x805cb8 <WaitEventSetWait+0x118>
805cb0: f9400108 ldr x8, [x8]
805cb4: b5001248 cbnz x8, 0x805efc <WaitEventSetWait+0x35c>
805cb8: f9401280 ldr x0, [x20, #32]
If that CPU can delay the writing to the variable waiting
(str x8, [x11, #3640]) in it's internal form like
"store 1 to [address]" to 805cb0 or a later instruction, then we can get the
behavior discussed. Something like that is shown in the ARM documentation:
https://developer.arm.com/documentation/102336/0100/Memory-ordering?lang=en
I'll try to test this guess on the target machine...
Best regards,
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2023-09-02 23:06:20 | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |
Previous Message | Tomas Vondra | 2023-09-02 19:09:44 | Re: Initdb-time block size specification |