Re: Parallel heap vacuum

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel heap vacuum
Date: 2024-12-25 16:52:51
Message-ID: c1337830-01d9-48a0-81f7-4b0d79d9333e@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/19/24 23:05, Masahiko Sawada wrote:
> On Sat, Dec 14, 2024 at 1:24 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> On 12/13/24 00:04, Tomas Vondra wrote:
>>> ...
>>>
>>> The main difference is here:
>>>
>>>
>>> master / no parallel workers:
>>>
>>> pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total)
>>>
>>> 1 parallel worker:
>>>
>>> pages: 0 removed, 221239 remain, 10001 scanned (4.52% of total)
>>>
>>>
>>> Clearly, with parallel vacuum we scan only a tiny fraction of the pages,
>>> essentially just those with deleted tuples, which is ~1/20 of pages.
>>> That's close to the 15x speedup.
>>>
>>> This effect is clearest without indexes, but it does affect even runs
>>> with indexes - having to scan the indexes makes it much less pronounced,
>>> though. However, these indexes are pretty massive (about the same size
>>> as the table) - multiple times larger than the table. Chances are it'd
>>> be clearer on realistic data sets.
>>>
>>> So the question is - is this correct? And if yes, why doesn't the
>>> regular (serial) vacuum do that?
>>>
>>> There's some more strange things, though. For example, how come the avg
>>> read rate is 0.000 MB/s?
>>>
>>> avg read rate: 0.000 MB/s, avg write rate: 525.533 MB/s
>>>
>>> It scanned 10k pages, i.e. ~80MB of data in 0.15 seconds. Surely that's
>>> not 0.000 MB/s? I guess it's calculated from buffer misses, and all the
>>> pages are in shared buffers (thanks to the DELETE earlier in that session).
>>>
>>
>> OK, after looking into this a bit more I think the reason is rather
>> simple - SKIP_PAGES_THRESHOLD.
>>
>> With serial runs, we end up scanning all pages, because even with an
>> update every 5000 tuples, that's still only ~25 pages apart, well within
>> the 32-page window. So we end up skipping no pages, scan and vacuum all
>> everything.
>>
>> But parallel runs have this skipping logic disabled, or rather the logic
>> that switches to sequential scans if the gap is less than 32 pages.
>>
>>
>> IMHO this raises two questions:
>>
>> 1) Shouldn't parallel runs use SKIP_PAGES_THRESHOLD too, i.e. switch to
>> sequential scans is the pages are close enough. Maybe there is a reason
>> for this difference? Workers can reduce the difference between random
>> and sequential I/0, similarly to prefetching. But that just means the
>> workers should use a lower threshold, e.g. as
>>
>> SKIP_PAGES_THRESHOLD / nworkers
>>
>> or something like that? I don't see this discussed in this thread.
>
> Each parallel heap scan worker allocates a chunk of blocks which is
> 8192 blocks at maximum, so we would need to use the
> SKIP_PAGE_THRESHOLD optimization within the chunk. I agree that we
> need to evaluate the differences anyway. WIll do the benchmark test
> and share the results.
>

Right. I don't think this really matters for small tables, and for large
tables the chunks should be fairly large (possibly up to 8192 blocks),
in which case we could apply SKIP_PAGE_THRESHOLD just like in the serial
case. There might be differences at boundaries between chunks, but that
seems like a minor / expected detail. I haven't checked know if the code
would need to change / how much.

>>
>> 2) It seems the current SKIP_PAGES_THRESHOLD is awfully high for good
>> storage. If I can get an order of magnitude improvement (or more than
>> that) by disabling the threshold, and just doing random I/O, maybe
>> there's time to adjust it a bit.
>
> Yeah, you've started a thread for this so let's discuss it there.
>

OK. FWIW as suggested in the other thread, it doesn't seem to be merely
a question of VACUUM performance, as not skipping pages gives vacuum the
opportunity to do cleanup that would otherwise need to happen later.

If only for this reason, I think it would be good to keep the serial and
parallel vacuum consistent.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-12-25 17:25:18 Re: Exporting float_to_shortest_decimal_buf(n) with Postgres 17 on Windows
Previous Message Tom Lane 2024-12-25 16:34:47 Re: Exporting float_to_shortest_decimal_buf(n) with Postgres 17 on Windows