Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?
Date: 2024-12-15 21:46:58
Message-ID: 89902847-17c0-422e-891d-deed170c856c@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While doing some benchmarking for some vacuum patches (parallel vacuum
[1] and using streaming read API [2]), I realized SKIP_PAGES_THRESHOLD
value may be too aggressive. The current value 32 was introduced in ~8.4
(bf136cf6e376 and b503da135ab0), so ~15 years ago. And maybe it was
appropriate back then, but the storage changed a lot since then, and it
might me quite inefficient on modern flash (and especially on NVMe).

The tests I did in the other threads are fairly simple:

1) create a table with ~100 rows per page (with fillfactor 50%)

2) update a fraction of rows

3) vacuum the table, measure the duration etc.

And in [1] I noticed that the parallel vacuum seems to be resulting in
speedup much higher than expected. But this turns out to be simply
because the parallel vacuum patch simply ignores SKIP_PAGES_THRESHOLD,
and never switches to sequential scan.

What's happening is that if the fraction of updated rows is sufficiently
high, the modified pages are within the SKIP_PAGES_THRESHOLD distance,
and vacuum switches to seqscan. But with flash storage that happens to
be harmful - I built master with different SKIP_PAGES_THRESHOLD (1, 2,
4, 8 and 16), and collected results for comparison.

Results for the large data set (34GB table) on the i5 machine (with RAID
on 6x SATA SSD) look like this:

frac master master-1 master-2 master-4 master-8 master-16
-------------------------------------------------------------------
0.000001 0.08 0.08 0.08 0.08 0.08 0.07
0.00001 0.20 0.17 0.17 0.17 0.18 0.19
0.000025 0.48 0.31 0.32 0.31 0.34 0.39
0.00005 1.18 0.54 0.55 0.54 0.67 0.78
0.000075 2.03 0.74 0.76 0.79 1.01 1.40
0.0001 2.80 0.96 0.98 1.02 1.44 1.97
0.0002 5.13 1.86 2.01 2.25 3.33 4.44
0.0005 5.18 4.57 5.57 6.85 8.14 7.44
0.001 4.23 9.27 11.59 11.37 8.58 5.44
0.01 5.78 6.58 5.98 5.74 5.84 5.74

These are timings in second. Timings relative to master branch (with
SKIP_PAGES_THRESHOLD=32) are

frac master-1 master-2 master-4 master-8 master-16
-----------------------------------------------------------------
0.000001 100% 99% 96% 99% 89%
0.00001 85% 84% 87% 89% 94%
0.000025 63% 66% 65% 71% 80%
0.00005 45% 46% 46% 57% 65%
0.000075 37% 38% 39% 50% 69%
0.0001 34% 35% 36% 51% 70%
0.0002 36% 39% 44% 65% 86%
0.0005 88% 108% 132% 157% 144%
0.001 219% 274% 269% 203% 129%
0.01 114% 103% 99% 101% 99%

So most of this shows significant speedups if we simply disable the
skipping logic (by setting the threshold to 1). There are some
regressions when scanning substantial fraction of the table (0.001 means
~10% pages are modified).

The results from the bigger machine with NVMe seem very similar, except
that there are no regressions at all. It's always faster to not switch
to sequential scans.

Attached is a PDF with a summary of results with a better visualization,
CSV files with raw results, and the script used to collect them. But it
looks very similar to this.

Obviously, it may not be a good idea to make conclusions based on a
single (and fairly simplistic) test. But I think it's reasonable to
conclude the current SKIP_PAGES_THRESHOLD value (i.e. 32) may be a bit
too high for modern storage. Based on "my" results it seems a value in
the 4-8 range might be more appropriate.

Or perhaps it should be user-tunable? I'm very reluctant to add more and
more knobs like this, but if we have effective_io_concurrency, isn't
this a very similar thing? Or maybe it'd be possible to adjust this
automatically based on existing parameters (esp. rnadom_page_cost and
effective_io_concurrency)?

I do believe most of this applies even to vacuum with stream API patches
[2], which leave SKIP_PAGES_THRESHOLD in place. In fact, isn't a similar
logic built into the read stream itself? Isn't vacuumlazy.c the wrong
level to do this anyway, as it's about I/O pattern - why should vacuum
worry about that?

regards

[1]
https://www.postgresql.org/message-id/flat/CAD21AoAEfCNv-GgaDheDJ+s-p_Lv1H24AiJeNoPGCmZNSwL1YA(at)mail(dot)gmail(dot)com

[2]
https://www.postgresql.org/message-id/flat/CAAKRu_Yf3gvXGcCnqqfoq0Q8LX8UM-e-qbm_B1LeZh60f8WhWA(at)mail(dot)gmail(dot)com

--
Tomas Vondra

Attachment Content-Type Size
skipping-xeon.csv text/csv 51.4 KB
skipping-test.sh application/x-shellscript 2.0 KB
skipping-i5.csv text/csv 51.1 KB
skipping-test.pdf application/pdf 55.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-12-15 22:18:39 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message Peter Smith 2024-12-15 21:22:39 Re: DOCS: pg_createsubscriber wrong link?