Get rid of WALBufMappingLock

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>
Subject: Get rid of WALBufMappingLock
Date: 2025-01-19 00:11:32
Message-ID: 39b39e7a-41b4-4f34-b3f5-db735e74a723@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Good day, hackers.

During discussion of Increasing NUM_XLOGINSERT_LOCKS [1], Andres Freund
used benchmark which creates WAL records very intensively. While I this
it is not completely fair (1MB log records are really rare), it pushed
me to analyze write-side waiting of XLog machinery.

First I tried to optimize WaitXLogInsertionsToFinish, but without great
success (yet).

While profiling, I found a lot of time is spend in the memory clearing
under global WALBufMappingLock:

MemSet((char *) NewPage, 0, XLOG_BLCKSZ);

It is obvious scalability bottleneck.

So "challenge was accepted".

Certainly, backend should initialize pages without exclusive lock. But
which way to ensure pages were initialized? In other words, how to
ensure XLogCtl->InitializedUpTo is correct.

I've tried to play around WALBufMappingLock with holding it for a short
time and spinning on XLogCtl->xlblocks[nextidx]. But in the end I found
WALBufMappingLock is useless at all.

Instead of holding lock, it is better to allow backends to cooperate:
- I bound ConditionVariable to each xlblocks entry,
- every backend now checks every required block pointed by
InitializedUpto was successfully initialized or sleeps on its condvar,
- when backend sure block is initialized, it tries to update
InitializedUpTo with conditional variable.

Andres's benchmark looks like:

c=100 && install/bin/psql -c checkpoint -c 'select pg_switch_wal()'
postgres && install/bin/pgbench -n -M prepared -c$c -j$c -f <(echo
"SELECT pg_logical_emit_message(true, 'test', repeat('0',
1024*1024));";) -P1 -T45 postgres

So, it generate 1M records as fast as possible for 45 seconds.

Test machine is Ryzen 5825U (8c/16th) limited to 2GHz.
Config:

max_connections = 1000
shared_buffers = 1024MB
fsync = off
wal_sync_method = fdatasync
full_page_writes = off
wal_buffers = 1024MB
checkpoint_timeout = 1d

Results are: "average for 45 sec" /"1 second max outlier"

Results for master @ d3d098316913 :
25 clients: 2908 /3230
50 clients: 2759 /3130
100 clients: 2641 /2933
200 clients: 2419 /2707
400 clients: 1928 /2377
800 clients: 1689 /2266

With v0-0001-Get-rid-of-WALBufMappingLock.patch :
25 clients: 3103 /3583
50 clients: 3183 /3706
100 clients: 3106 /3559
200 clients: 2902 /3427
400 clients: 2303 /2717
800 clients: 1925 /2329

Combined with v0-0002-several-attempts-to-lock-WALInsertLocks.patch

No WALBufMappingLock + attempts on XLogInsertLock:
25 clients: 3518 /3750
50 clients: 3355 /3548
100 clients: 3226 /3460
200 clients: 3092 /3299
400 clients: 2575 /2801
800 clients: 1946 /2341

This results are with untouched NUM_XLOGINSERT_LOCKS == 8.

[1]
http://postgr.es/m/flat/3b11fdc2-9793-403d-b3d4-67ff9a00d447%40postgrespro.ru

PS.
Increasing NUM_XLOGINSERT_LOCKS to 64 gives:
25 clients: 3457 /3624
50 clients: 3215 /3500
100 clients: 2750 /3000
200 clients: 2535 /2729
400 clients: 2163 /2400
800 clients: 1700 /2060

While doing this on master:
25 clients 2645 /2953
50 clients: 2562 /2968
100 clients: 2364 /2756
200 clients: 2266 /2564
400 clients: 1868 /2228
800 clients: 1527 /2133

So, patched version with increased NUM_XLOGINSERT_LOCKS looks no worse
than unpatched without increasing num of locks.

-------
regards
Yura Sokolov aka funny-falcon

Attachment Content-Type Size
v0-0001-Get-rid-of-WALBufMappingLock.patch text/x-patch 17.0 KB
v0-0002-several-attempts-to-lock-WALInsertLocks.patch text/x-patch 3.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yura Sokolov 2025-01-19 00:17:26 Re: [RFC] Lock-free XLog Reservation from WAL
Previous Message Sami Imseih 2025-01-18 23:42:50 Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?