use a non-locking initial test in TAS_SPIN on AArch64

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: dipiets(at)amazon(dot)com
Subject: use a non-locking initial test in TAS_SPIN on AArch64
Date: 2024-10-22 19:54:57
Message-ID: ZxgDEb_VpWyNZKB_@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

My colleague Salvatore Dipietro (CC'd) sent me a couple of profiles that
showed an enormous amount of s_lock() time going to the
__sync_lock_test_and_set() call in the AArch64 implementation of tas().
Upon closer inspection, I noticed that we don't implement a custom
TAS_SPIN() for this architecture, so I quickly hacked together the attached
patch and ran a couple of benchmarks that stressed the spinlock code. I
found no discussion about TAS_SPIN() on ARM in the archives, but I did
notice that the initial AArch64 support was added [0] before x86_64 started
using a non-locking test [1].

These benchmarks are for a c8g.24xlarge running a select-only pgbench with
256 clients and pg_stat_statements.track_planning enabled.

without the patch:

90.04% postgres [.] s_lock
1.07% pg_stat_statements.so [.] pgss_store
0.59% postgres [.] LWLockRelease
0.56% postgres [.] perform_spin_delay
0.31% [kernel] [k] arch_local_irq_enable

| while (TAS_SPIN(lock))
| {
| perform_spin_delay(&delayStatus);
0.12 |2c: -> bl perform_spin_delay
| tas():
| return __sync_lock_test_and_set(lock, 1);
0.01 |30: swpa w20, w1, [x19]
| s_lock():
99.87 | add x0, sp, #0x28
| while (TAS_SPIN(lock))
0.00 | ^ cbnz w1, 2c

tps = 74135.100891 (without initial connection time)

with the patch:

30.46% postgres [.] s_lock
5.88% postgres [.] perform_spin_delay
4.61% [kernel] [k] arch_local_irq_enable
3.31% [kernel] [k] next_uptodate_page
2.50% postgres [.] hash_search_with_hash_value

| while (TAS_SPIN(lock))
| {
| perform_spin_delay(&delayStatus);
0.63 |2c:+-->add x0, sp, #0x28
0.07 | |-> bl perform_spin_delay
| |while (TAS_SPIN(lock))
0.25 |34:| ldr w0, [x19]
65.19 | +--cbnz w0, 2c
| tas():
| return __sync_lock_test_and_set(lock, 1);
0.00 | swpa w20, w0, [x19]
| s_lock():
33.82 | ^ cbnz w0, 2c

tps = 549462.785554 (without initial connection time)

[0] https://postgr.es/c/5c7603c
[1] https://postgr.es/c/b03d196

--
nathan

Attachment Content-Type Size
v1-0001-Use-a-non-locking-initial-test-in-TAS_SPIN-on-AAr.patch text/plain 870 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Sjoblom 2024-10-22 21:58:29 Unexpected table size usage for small composite arrays
Previous Message Masahiko Sawada 2024-10-22 19:50:30 Re: Make default subscription streaming option as Parallel