Re: Parallel heap vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel heap vacuum
Date: 2025-01-12 09:34:56
Message-ID: CAD21AoDO5YTOVamUoC210mFGJM3d_N5pO-+vvwxT2kVqzFAYcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 3, 2025 at 3:38 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Dec 25, 2024 at 8:52 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> >
> >
> >
> > On 12/19/24 23:05, Masahiko Sawada wrote:
> > > On Sat, Dec 14, 2024 at 1:24 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> > >>
> > >> On 12/13/24 00:04, Tomas Vondra wrote:
> > >>> ...
> > >>>
> > >>> The main difference is here:
> > >>>
> > >>>
> > >>> master / no parallel workers:
> > >>>
> > >>> pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total)
> > >>>
> > >>> 1 parallel worker:
> > >>>
> > >>> pages: 0 removed, 221239 remain, 10001 scanned (4.52% of total)
> > >>>
> > >>>
> > >>> Clearly, with parallel vacuum we scan only a tiny fraction of the pages,
> > >>> essentially just those with deleted tuples, which is ~1/20 of pages.
> > >>> That's close to the 15x speedup.
> > >>>
> > >>> This effect is clearest without indexes, but it does affect even runs
> > >>> with indexes - having to scan the indexes makes it much less pronounced,
> > >>> though. However, these indexes are pretty massive (about the same size
> > >>> as the table) - multiple times larger than the table. Chances are it'd
> > >>> be clearer on realistic data sets.
> > >>>
> > >>> So the question is - is this correct? And if yes, why doesn't the
> > >>> regular (serial) vacuum do that?
> > >>>
> > >>> There's some more strange things, though. For example, how come the avg
> > >>> read rate is 0.000 MB/s?
> > >>>
> > >>> avg read rate: 0.000 MB/s, avg write rate: 525.533 MB/s
> > >>>
> > >>> It scanned 10k pages, i.e. ~80MB of data in 0.15 seconds. Surely that's
> > >>> not 0.000 MB/s? I guess it's calculated from buffer misses, and all the
> > >>> pages are in shared buffers (thanks to the DELETE earlier in that session).
> > >>>
> > >>
> > >> OK, after looking into this a bit more I think the reason is rather
> > >> simple - SKIP_PAGES_THRESHOLD.
> > >>
> > >> With serial runs, we end up scanning all pages, because even with an
> > >> update every 5000 tuples, that's still only ~25 pages apart, well within
> > >> the 32-page window. So we end up skipping no pages, scan and vacuum all
> > >> everything.
> > >>
> > >> But parallel runs have this skipping logic disabled, or rather the logic
> > >> that switches to sequential scans if the gap is less than 32 pages.
> > >>
> > >>
> > >> IMHO this raises two questions:
> > >>
> > >> 1) Shouldn't parallel runs use SKIP_PAGES_THRESHOLD too, i.e. switch to
> > >> sequential scans is the pages are close enough. Maybe there is a reason
> > >> for this difference? Workers can reduce the difference between random
> > >> and sequential I/0, similarly to prefetching. But that just means the
> > >> workers should use a lower threshold, e.g. as
> > >>
> > >> SKIP_PAGES_THRESHOLD / nworkers
> > >>
> > >> or something like that? I don't see this discussed in this thread.
> > >
> > > Each parallel heap scan worker allocates a chunk of blocks which is
> > > 8192 blocks at maximum, so we would need to use the
> > > SKIP_PAGE_THRESHOLD optimization within the chunk. I agree that we
> > > need to evaluate the differences anyway. WIll do the benchmark test
> > > and share the results.
> > >
> >
> > Right. I don't think this really matters for small tables, and for large
> > tables the chunks should be fairly large (possibly up to 8192 blocks),
> > in which case we could apply SKIP_PAGE_THRESHOLD just like in the serial
> > case. There might be differences at boundaries between chunks, but that
> > seems like a minor / expected detail. I haven't checked know if the code
> > would need to change / how much.
> >
> > >>
> > >> 2) It seems the current SKIP_PAGES_THRESHOLD is awfully high for good
> > >> storage. If I can get an order of magnitude improvement (or more than
> > >> that) by disabling the threshold, and just doing random I/O, maybe
> > >> there's time to adjust it a bit.
> > >
> > > Yeah, you've started a thread for this so let's discuss it there.
> > >
> >
> > OK. FWIW as suggested in the other thread, it doesn't seem to be merely
> > a question of VACUUM performance, as not skipping pages gives vacuum the
> > opportunity to do cleanup that would otherwise need to happen later.
> >
> > If only for this reason, I think it would be good to keep the serial and
> > parallel vacuum consistent.
> >
>
> I've not evaluated SKIP_PAGE_THRESHOLD optimization yet but I'd like
> to share the latest patch set as cfbot reports some failures. Comments
> from Kuroda-san are also incorporated in this version. Also, I'd like
> to share the performance test results I did with the latest patch.
>

I've implemented SKIP_PAGE_THRESHOLD optimization in parallel heap
scan, and attached the updated patch set. I've attached the
performance test results too to compare v6 and v7 patch sets. I can
see there are not big differences in test cases but the v7 patch has a
slightly better performance.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
parallel_heap_vacuum_v6_v7.pdf application/pdf 54.8 KB
v7-0004-raidxtree.h-support-shared-iteration.patch application/octet-stream 16.0 KB
v7-0006-radixtree.h-Add-RT_NUM_KEY-API-to-get-the-number-.patch application/octet-stream 1.9 KB
v7-0005-tidstore.c-Support-shared-itereation.patch application/octet-stream 7.1 KB
v7-0007-tidstore.c-Add-TidStoreNumBlocks-API-to-get-the-n.patch application/octet-stream 1.6 KB
v7-0008-Support-parallel-heap-vacuum-during-lazy-vacuum.patch application/octet-stream 22.3 KB
v7-0003-Support-parallel-heap-scan-during-lazy-vacuum.patch application/octet-stream 78.1 KB
v7-0002-Remember-the-number-of-times-parallel-index-vacuu.patch application/octet-stream 6.8 KB
v7-0001-Move-lazy-heap-scanning-related-variables-to-stru.patch application/octet-stream 27.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2025-01-12 12:43:25 Compression of bigger WAL records
Previous Message Ryo Kanbayashi 2025-01-12 09:27:48 Re: ecpg command does not warn COPY ... FROM STDIN;