Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?
Date: 2024-12-16 15:37:35
Message-ID: CAAKRu_Z8FfGpX8JDCEn3d3_6vQ7kc-xUWh--GEx_sAXXgCcNrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 15, 2024 at 4:47 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> What's happening is that if the fraction of updated rows is sufficiently
> high, the modified pages are within the SKIP_PAGES_THRESHOLD distance,
> and vacuum switches to seqscan. But with flash storage that happens to
> be harmful - I built master with different SKIP_PAGES_THRESHOLD (1, 2,
> 4, 8 and 16), and collected results for comparison.
>
<--snip-->
> These are timings in second. Timings relative to master branch (with
> SKIP_PAGES_THRESHOLD=32) are
>
> frac master-1 master-2 master-4 master-8 master-16
> -----------------------------------------------------------------
> 0.000001 100% 99% 96% 99% 89%
> 0.00001 85% 84% 87% 89% 94%
> 0.000025 63% 66% 65% 71% 80%
> 0.00005 45% 46% 46% 57% 65%
> 0.000075 37% 38% 39% 50% 69%
> 0.0001 34% 35% 36% 51% 70%
> 0.0002 36% 39% 44% 65% 86%
> 0.0005 88% 108% 132% 157% 144%
> 0.001 219% 274% 269% 203% 129%
> 0.01 114% 103% 99% 101% 99%
>
> So most of this shows significant speedups if we simply disable the
> skipping logic (by setting the threshold to 1). There are some
> regressions when scanning substantial fraction of the table (0.001 means
> ~10% pages are modified).

Thanks for running these benchmarks!

I know you used explicit vacuum, but I was just thinking that with the
default autovacuum_vacuum_scale_factor you'd have to modify 20% of the
tuples to trigger an autovacuum, so depending on the fill factor and
tuple size, it may be normal for 10% of the pages to need scanning by
vacuum. However, we do tell users that 20% is too high anyway.

On a related note, the other day I noticed another negative effect
caused in part by SKIP_PAGES_THRESHOLD. SKIP_PAGES_THRESHOLD interacts
with the opportunistic freeze heuristic [1] causing lots of all-frozen
pages to be scanned when checksums are enabled. You can easily end up
with a table that has very fragmented ranges of frozen, all-visible,
and modified pages. In this case, the opportunistic freeze heuristic
bears most of the blame. However, we are not close to coming up with a
replacement heuristic, so removing SKIP_PAGES_THRESHOLD would help.
This wouldn't have affected your results, but it is worth considering
more generally.

> The results from the bigger machine with NVMe seem very similar, except
> that there are no regressions at all. It's always faster to not switch
> to sequential scans.

And we are suggesting to users with bigger, faster machines and SSDs
to tune autovacuum to run more often.

> Obviously, it may not be a good idea to make conclusions based on a
> single (and fairly simplistic) test. But I think it's reasonable to
> conclude the current SKIP_PAGES_THRESHOLD value (i.e. 32) may be a bit
> too high for modern storage. Based on "my" results it seems a value in
> the 4-8 range might be more appropriate.

A range of 4-8 would address some of the negative effects I saw with
the interaction of SKIP_PAGES_THRESHOLD and the opportunistic freeze
heuristic causing all-frozen pages to be read.

> Or perhaps it should be user-tunable? I'm very reluctant to add more and
> more knobs like this, but if we have effective_io_concurrency, isn't
> this a very similar thing? Or maybe it'd be possible to adjust this
> automatically based on existing parameters (esp. rnadom_page_cost and
> effective_io_concurrency)?

I don't think we want to expose it to the user -- given that the plan
in the long run is to get rid of it.

> I do believe most of this applies even to vacuum with stream API patches
> [2], which leave SKIP_PAGES_THRESHOLD in place. In fact, isn't a similar
> logic built into the read stream itself? Isn't vacuumlazy.c the wrong
> level to do this anyway, as it's about I/O pattern - why should vacuum
> worry about that?

I agree about vacuumlazy.c being an unfortunate place to have this
logic. It makes the code there MUCH more complicated.

The read stream API isn't a drop-in replacement for
SKIP_PAGES_THRESHOLD because it only does read combining and
prefetching -- not ensuring sequential ranges are requested. It still
relies on kernel readahead logic. However, with AIO, Andres has
pointed out SKIP_PAGES_THRESHOLD no longer makes sense. With AIO,
multiple reads will be initiated at the same time, so the kernel won't
be getting sequential range requests anyway (even when using buffered
IO). And with direct IO, there is no kernel readahead logic to
confuse.

However, I think your benchmarks show that SKIP_PAGES_THRESHOLD may
not be operating in the way it was intended anymore anyway.

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_ZvkxDRtMAhYRTK9N60jfMMVkiQgnEP5m_asYnuvgmQOg%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-12-16 15:42:04 Re: per backend I/O statistics
Previous Message Tom Lane 2024-12-16 15:33:59 Re: Improving default column names/aliases of subscript text expressions