From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Victor Yegorov <vyegorov(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: Get rid of WALBufMappingLock |
Date: | 2025-03-14 14:30:01 |
Message-ID: | 2ba2ab78-7909-4ad9-9fac-6f95475aad49@vondra.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I've briefly looked at this patch this week, and done a bit of testing.
I don't have any comments about the correctness - it does seem correct
to me and I haven't noticed any crashes/issues, but I'm not familiar
with the WALBufMappingLock enough to have insightful opinions.
I have however decided to do a bit of benchmarking, to better understand
the possible benefits of the change. I happen to have access to an Azure
machine with 2x AMD EPYC 9V33X (176 cores in total), and NVMe SSD that
can do ~1.5GB/s.
The benchmark script (attached) uses the workload mentioned by Andres
some time ago [1]
SELECT pg_logical_emit_message(true, 'test', repeat('0', $SIZE));
with clients (1..196) and sizes 8K, 64K and 1024K. The aggregated
results look like this (this is throughput):
| 8 | 64 | 1024
clients | master patched | master patched | master patched
---------------------------------------------------------------------
1 | 11864 12035 | 7419 7345 | 968 940
4 | 26311 26919 | 12414 12308 | 1304 1293
8 | 38742 39651 | 14316 14539 | 1348 1348
16 | 57299 59917 | 15405 15871 | 1304 1279
32 | 74857 82598 | 17589 17126 | 1233 1233
48 | 87596 95495 | 18616 18160 | 1199 1227
64 | 89982 97715 | 19033 18910 | 1196 1221
96 | 92853 103448 | 19694 19706 | 1190 1210
128 | 95392 103324 | 20085 19873 | 1188 1213
160 | 94933 102236 | 20227 20323 | 1180 1214
196 | 95933 103341 | 20448 20513 | 1188 1199
To put this into a perspective, this throughput relative to master:
clients | 8 64 1024
----------------------------------
1 | 101% 99% 97%
4 | 102% 99% 99%
8 | 102% 102% 100%
16 | 105% 103% 98%
32 | 110% 97% 100%
48 | 109% 98% 102%
64 | 109% 99% 102%
96 | 111% 100% 102%
128 | 108% 99% 102%
160 | 108% 100% 103%
196 | 108% 100% 101%
That does not seem like a huge improvement :-( Yes, there's 1-10%
speedup for the small (8K) size, but for larger chunks it's a wash.
Looking at the pgbench progress, I noticed stuff like this:
...
progress: 13.0 s, 103575.2 tps, lat 0.309 ms stddev 0.071, 0 failed
progress: 14.0 s, 102685.2 tps, lat 0.312 ms stddev 0.072, 0 failed
progress: 15.0 s, 102853.9 tps, lat 0.311 ms stddev 0.072, 0 failed
progress: 16.0 s, 103146.0 tps, lat 0.310 ms stddev 0.075, 0 failed
progress: 17.0 s, 57168.1 tps, lat 0.560 ms stddev 0.153, 0 failed
progress: 18.0 s, 50495.9 tps, lat 0.634 ms stddev 0.060, 0 failed
progress: 19.0 s, 50927.0 tps, lat 0.628 ms stddev 0.066, 0 failed
progress: 20.0 s, 50986.7 tps, lat 0.628 ms stddev 0.062, 0 failed
progress: 21.0 s, 50652.3 tps, lat 0.632 ms stddev 0.061, 0 failed
progress: 22.0 s, 63792.9 tps, lat 0.502 ms stddev 0.168, 0 failed
progress: 23.0 s, 103109.9 tps, lat 0.310 ms stddev 0.072, 0 failed
progress: 24.0 s, 103503.8 tps, lat 0.309 ms stddev 0.071, 0 failed
progress: 25.0 s, 101984.2 tps, lat 0.314 ms stddev 0.073, 0 failed
progress: 26.0 s, 102923.1 tps, lat 0.311 ms stddev 0.072, 0 failed
progress: 27.0 s, 103973.1 tps, lat 0.308 ms stddev 0.072, 0 failed
...
i.e. it fluctuates a lot. I suspected this is due to the SSD doing funny
things (it's a virtual SSD, I'm not sure what model is that behind the
curtains). So I decided to try running the benchmark on tmpfs, to get
the storage out of the way and get the "best case" results.
This makes the pgbench progress perfectly "smooth" (no jumps like in the
output above), and the comparison looks like this:
| 8 | 64 | 1024
clients | master patched | master patched | master patched
---------|---------------------|--------------------|----------------
1 | 32449 32032 | 19289 20344 | 3108 3081
4 | 68779 69256 | 24585 29912 | 2915 3449
8 | 79787 100655 | 28217 39217 | 3182 4086
16 | 113024 148968 | 42969 62083 | 5134 5712
32 | 125884 170678 | 44256 71183 | 4910 5447
48 | 125571 166695 | 44693 76411 | 4717 5215
64 | 122096 160470 | 42749 83754 | 4631 5103
96 | 120170 154145 | 42696 86529 | 4556 5020
128 | 119204 152977 | 40880 88163 | 4529 5047
160 | 116081 152708 | 42263 88066 | 4512 5000
196 | 115364 152455 | 40765 88602 | 4505 4952
and the comparison to master:
clients 8 64 1024
-----------------------------------------
1 99% 105% 99%
4 101% 122% 118%
8 126% 139% 128%
16 132% 144% 111%
32 136% 161% 111%
48 133% 171% 111%
64 131% 196% 110%
96 128% 203% 110%
128 128% 216% 111%
160 132% 208% 111%
196 132% 217% 110%
Yes, with tmpfs the impact looks much more significant. For 8K the
speedup is ~1.3x, for 64K it's up to ~2x, for 1M it's ~1.1x.
That being said, I wonder how big is the impact for practical workloads.
ISTM this workload is pretty narrow / extreme, it'd be much easier if we
had an example of a more realistic workload, benefiting from this. Of
course, it may be the case that there are multiple related bottlenecks,
and we'd need to fix all of them - in which case it'd be silly to block
the improvements on the grounds that it alone does not help.
Another thought is that this is testing the "good case". Can anyone
think of a workload that would be made worse by the patch?
regards
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
wal-lock-test.sh | application/x-shellscript | 976 bytes |
patched-tmpfs.csv | text/csv | 2.1 KB |
master-tmpfs.csv | text/csv | 2.1 KB |
master-ssd.csv | text/csv | 3.4 KB |
patched-ssd.csv | text/csv | 3.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dagfinn Ilmari Mannsåker | 2025-03-14 14:31:04 | Re: Proposal: manipulating pg_control file from Perl |
Previous Message | Peter Eisentraut | 2025-03-14 14:27:31 | Re: Adding support for SSLKEYLOGFILE in the frontend |