Re: Parallel heap vacuum

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel heap vacuum
Date: 2024-12-14 21:24:44
Message-ID: 3d1001d9-4970-477b-98a3-81f6342f75ce@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/13/24 00:04, Tomas Vondra wrote:
> ...
>
> The main difference is here:
>
>
> master / no parallel workers:
>
> pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total)
>
> 1 parallel worker:
>
> pages: 0 removed, 221239 remain, 10001 scanned (4.52% of total)
>
>
> Clearly, with parallel vacuum we scan only a tiny fraction of the pages,
> essentially just those with deleted tuples, which is ~1/20 of pages.
> That's close to the 15x speedup.
>
> This effect is clearest without indexes, but it does affect even runs
> with indexes - having to scan the indexes makes it much less pronounced,
> though. However, these indexes are pretty massive (about the same size
> as the table) - multiple times larger than the table. Chances are it'd
> be clearer on realistic data sets.
>
> So the question is - is this correct? And if yes, why doesn't the
> regular (serial) vacuum do that?
>
> There's some more strange things, though. For example, how come the avg
> read rate is 0.000 MB/s?
>
> avg read rate: 0.000 MB/s, avg write rate: 525.533 MB/s
>
> It scanned 10k pages, i.e. ~80MB of data in 0.15 seconds. Surely that's
> not 0.000 MB/s? I guess it's calculated from buffer misses, and all the
> pages are in shared buffers (thanks to the DELETE earlier in that session).
>

OK, after looking into this a bit more I think the reason is rather
simple - SKIP_PAGES_THRESHOLD.

With serial runs, we end up scanning all pages, because even with an
update every 5000 tuples, that's still only ~25 pages apart, well within
the 32-page window. So we end up skipping no pages, scan and vacuum all
everything.

But parallel runs have this skipping logic disabled, or rather the logic
that switches to sequential scans if the gap is less than 32 pages.

IMHO this raises two questions:

1) Shouldn't parallel runs use SKIP_PAGES_THRESHOLD too, i.e. switch to
sequential scans is the pages are close enough. Maybe there is a reason
for this difference? Workers can reduce the difference between random
and sequential I/0, similarly to prefetching. But that just means the
workers should use a lower threshold, e.g. as

SKIP_PAGES_THRESHOLD / nworkers

or something like that? I don't see this discussed in this thread.

2) It seems the current SKIP_PAGES_THRESHOLD is awfully high for good
storage. If I can get an order of magnitude improvement (or more than
that) by disabling the threshold, and just doing random I/O, maybe
there's time to adjust it a bit.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-12-14 21:34:30 Re: Converting contrib SQL functions to new style
Previous Message Tom Lane 2024-12-14 18:00:00 Re: Converting contrib SQL functions to new style