Quick Links

Re: Parallel heap vacuum

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallel heap vacuum
Date:	2025-03-24 23:58:18
Message-ID:	CAD21AoA1ELrL6upKn5Bq=uYAnB6M2L1V3Bi4RnXUNEnhRDdfaQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Mar 23, 2025 at 10:01 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Sun, Mar 23, 2025 at 4:46 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > If we use ParallelBlockTableScanDesc with streaming read like the
> > patch did, we would also need to somehow rewind the number of blocks
> > allocated to workers. The problem I had with such usage was that a
> > parallel vacuum worker allocated a new chunk of blocks when doing
> > look-ahead reading and therefore advanced
> > ParallelBlockTableScanDescData.phs_nallocated. In this case, even if
> > we unpin the remaining buffers in the queue by a new functionality and
> > a parallel worker resumes the phase 1 from the last processed block,
> > we would lose some blocks in already allocated chunks unless we rewind
> > ParallelBlockTableScanDescData and ParallelBlockTableScanWorkerData
> > data. However, since a worker might have already allocated multiple
> > chunks it would not be easy to rewind these scan state data.
>
> Ah I didn't realize rewinding the state would be difficult. It seems
> like the easiest way to make sure those blocks are done is to add them
> back to the counter somehow. And I don't suppose there is some way to
> save these not yet done block assignments somewhere and give them to
> the workers who restart phase I to process on the second pass?

It might be possible to store the not-yet-done-blocks in DSA to pass
them to the next workers. But it would make the codes more complex.

> > Another idea is that parallel workers don't exit phase 1 until it
> > consumes all pinned buffers in the queue, even if the memory usage of
> > TidStore exceeds the limit. It would need to add new functionality to
> > the read stream to disable the look-ahead reading. Since we could use
> > much memory while processing these buffers, exceeding the memory
> > limit, we can trigger this mode when the memory usage of TidStore
> > reaches 70% of the limit or so. On the other hand, it means that we
> > would not use the streaming read for the blocks in this mode, which is
> > not efficient.
>
> That might work. And/or maybe you could start decreasing the size of
> block assignment chunks when the memory usage of TidStore reaches a
> certain level. I don't know how much that would help or how fiddly it
> would be.

I've tried this idea in the attached version patch. I've started with
a simple approach; once the TidStore reaches the limit,
heap_vac_scan_next_block(), a callback for the read stream, begins to
return InvalidBlockNumber. We continue phase 1 until the read stream
is exhausted.

>
> > So we would need to
> > invent a way to stop and resume the read stream in the middle during
> > parallel scan.
>
> As for needing to add new read stream functionality, we actually
> probably don't have to. If you use read_stream_end() ->
> read_stream_reset(), it resets the distance to 0, so then
> read_stream_next_buffer() should just end up unpinning the buffers and
> freeing the per buffer data. I think the easiest way to implement this
> is to think about it as ending a read stream and starting a new one
> next time you start phase I and not as pausing and resuming the read
> stream. And anyway, maybe it's better not to keep a bunch of pinned
> buffers and allocated memory hanging around while doing what could be
> very long index scans.

You're right. I've studied the read stream code and figured out how to
use it. In the attached patch, we end the read stream at the end of
phase 1 and start a new read stream, as you suggested.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v13-0002-vacuumparallel.c-Support-parallel-vacuuming-for-.patch	application/octet-stream	23.0 KB
v13-0005-Support-parallelism-for-collecting-dead-items-du.patch	application/octet-stream	54.0 KB
v13-0004-Move-GlobalVisState-definition-to-snapmgr_intern.patch	application/octet-stream	9.1 KB
v13-0003-Move-lazy-heap-scan-related-variables-to-new-str.patch	application/octet-stream	30.9 KB
v13-0001-Introduces-table-AM-APIs-for-parallel-table-vacu.patch	application/octet-stream	9.4 KB

In response to

Re: Parallel heap vacuum at 2025-03-23 17:01:44 from Melanie Plageman

Responses

Re: Parallel heap vacuum at 2025-03-26 20:00:38 from Melanie Plageman

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2025-03-24 23:58:51	Re: Parallel heap vacuum
Previous Message	Sami Imseih	2025-03-24 23:47:59	Re: Proposal - Allow extensions to set a Plan Identifier