From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Peter Smith <smithpb2250(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel heap vacuum |
Date: | 2025-03-22 14:16:06 |
Message-ID: | CAAKRu_ZWerz6+=L_dpdasJ7GFni1G1oMJU3FgRPuhaVuosRyyw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 20, 2025 at 4:36 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> When testing the multi passes of table vacuuming, I found an issue.
> With the current patch, both leader and parallel workers process stop
> the phase 1 as soon as the shared TidStore size reaches to the limit,
> and then the leader resumes the parallel heap scan after heap
> vacuuming and index vacuuming. Therefore, as I described in the patch,
> one tricky part of this patch is that it's possible that we launch
> fewer workers than the previous time when resuming phase 1 after phase
> 2 and 3. In this case, since the previous parallel workers might have
> already allocated some blocks in their chunk, newly launched workers
> need to take over their parallel scan state. That's why in the patch
> we store workers' ParallelBlockTableScanWorkerData in shared memory.
> However, I found my assumption is wrong; in order to take over the
> previous scan state properly we need to take over not only
> ParallelBlockTableScanWorkerData but also ReadStream data as parallel
> workers might have already queued some blocks for look-ahead in their
> ReadStream. Looking at ReadStream codes, I find that it's not
> realistic to store it into the shared memory.
It seems like one way to solve this would be to add functionality to
the read stream to unpin the buffers it has in the buffers queue
without trying to continue calling the callback until the stream is
exhausted.
We have read_stream_reset(), but that is to restart streams that have
already been exhausted. Exhausted streams are where the callback has
returned InvalidBlockNumber. In the read_stream_reset() cases, the
read stream user knows there are more blocks it would like to scan or
that it would like to restart the scan from the beginning.
Your case is you want to stop trying to exhaust the read stream and
just unpin the remaining buffers. As long as the worker which paused
phase I knows exactly the last block it processed and can communicate
this to whatever worker resumes phase I later, it can initialize
vacrel->current_block to the last block processed.
> One plausible solution would be that we don't use ReadStream in
> parallel heap vacuum cases but directly use
> table_block_parallelscan_xxx() instead. It works but we end up having
> two different scan methods for parallel and non-parallel lazy heap
> scan. I've implemented this idea in the attached v12 patches.
One question is which scenarios will parallel vacuum phase I without
AIO be faster than read AIO-ified vacuum phase I. Without AIO writes,
I suppose it would be trivial for phase I parallel vacuum to be faster
without using read AIO. But it's worth thinking about the tradeoff.
- Melanie
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2025-03-22 14:31:17 | Re: Next commitfest app release is planned for March 18th |
Previous Message | vignesh C | 2025-03-22 13:56:14 | Improve error reporting for few options in pg_createsubscriber |