From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel heap vacuum |
Date: | 2024-12-20 23:25:37 |
Message-ID: | CAD21AoA1VHim2B6Dzcfri+zf+zA82KSxzLLj2pwKtQ19yi-uTQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 11, 2024 at 12:07 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Dec 9, 2024 at 2:11 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> >
> > Hi,
> >
> > Thanks for working on this. I took a quick look at this today, to do
> > some basic review. I plan to do a bunch of testing, but that's mostly to
> > get a better idea of what kind of improvements to expect - the initial
> > results look quite nice and sensible.
>
> Thank you for reviewing the patch!
I've attached the updated patches. Here some comments for some review comments:
>
> >
> > 2) I find it a bit weird vacuumlazy.c needs to include optimizer/paths.h
> > because it really has nothing to do with planning / paths. I realize it
> > needs the min_parallel_table_scan_size, but it doesn't seem right. I
> > guess it's a sign this bit of code (calculating parallel workers based
> > on log of relation size) should in some "shared" location.
>
> True. The same is actually true also for vacuumparallel.c. It includes
> optimizer/paths.h to use min_parallel_index_scan_size.
I left this change for now. Since vacuumparallel.c already has this
issue, I think we can create a separate patch to address this issue.
> >
> > 4) I think it would be good to have some sort of README explaining how
> > the parallel heap vacuum works, i.e. how it's driven by FSM. Took me a
> > while to realize how the workers coordinate which blocks to scan.
>
> +1. I will add README in the next version patch.
I've added the comment at the top of vacuumlazy.c to explain the
overall of how parallel vacuum works (done in 0008 patch).
>
> >
> > 5) Wouldn't it be better to introduce the scan_stats (grouping some of
> > the fields in a separate patch)? Seems entirely independent from the
> > parallel part, so doing it separately would make it easier to review.
> > Also, maybe reference the fields through scan_stats->x, instead of
> > through vacrel->scan_stats->x, when there's the pointer.
>
> Agreed.
Done in 0001 patch.
>
> > 6) Is it a good idea to move NewRelfrozenXID/... to the scan_stats?
> > AFAIK it's not a statistic, it's actually a parameter affecting the
> > decisions, right?
>
> Right. It would be better to move them to a separate struct or somewhere.
I've renamed it to LVRelScanState.
>
> >
> > 8) It's not clear to me why do_lazy_scan_heap() needs to "advertise" the
> > current block. Can you explain?
>
> The workers' current block numbers are used to calculate the minimum
> block number where we've scanned so far. In serial scan case, we
> vacuum FSM of the particular block range for every
> VACUUM_FSM_EVERY_PAGES pages . On the other hand, in parallel scan
> case, it doesn't make sense to vacuum FSM in that way because we might
> not have processed some blocks in the block range. So the idea is to
> calculate the minimum block number where we've scanned so far and
> vacuum FSM of the range of consecutive already-scanned blocks.
I've simplified the logic to calculate the minimum scanned block. We
didn't actually need to advertise the current block.
>
> > 11) Why does GlobalVisState need to move to snapmgr.h? If I undo this
> > the patch still builds fine for me.
>
> Oh, I might have missed something. I'll check if it's really necessary.
I've tried to undo that change, but now that we copy the contents of
GlobalVisState in vacuumlazy.c it seems we need to expose the
declaration of GlobalVisState.
The attached patches address all comments I got so far including
comments from Peter[1][2]. From the previous version, I've made many
changes not only for fixing bugs but also for improving parallel
vacuum logic itself and comments. So some review comments about typos
and clarifying the comments are not addressed where I've removed these
comments themself.
I'm doing some benchmark tests so I will share the results.
Feedback is very welcome!
Regards,
[1] https://www.postgresql.org/message-id/CAHut%2BPtnyLVkgg7BsfXy0ciVeyCBaXNRSSi0h8AVdx9cTL9_ug%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPsA%3D9UOFKd52A41DSTgeUreMuuweWHmxsokqLzTMao%3DRw%40mail.gmail.com
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v5-0006-radixtree.h-Add-RT_NUM_KEY-API-to-get-the-number-.patch | application/octet-stream | 1.9 KB |
v5-0005-Support-shared-itereation-on-TidStore.patch | application/octet-stream | 7.1 KB |
v5-0008-Support-parallel-heap-vacuum-during-lazy-vacuum.patch | application/octet-stream | 22.3 KB |
v5-0004-raidxtree.h-support-shared-iteration.patch | application/octet-stream | 16.8 KB |
v5-0007-Add-TidStoreNumBlocks-API-to-get-the-number-of-bl.patch | application/octet-stream | 1.6 KB |
v5-0002-Remember-the-number-of-times-parallel-index-vacuu.patch | application/octet-stream | 6.8 KB |
v5-0003-Support-parallel-heap-scan-during-lazy-vacuum.patch | application/octet-stream | 74.5 KB |
v5-0001-Move-lazy-heap-scanning-related-variables-to-stru.patch | application/octet-stream | 27.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Jones | 2024-12-20 23:51:42 | Add XMLNamespaces to XMLElement |
Previous Message | Michael Paquier | 2024-12-20 23:22:22 | Re: Possible integer overflow in bringetbitmap() |