Re: Parallel heap vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel heap vacuum
Date: 2025-03-23 17:01:44
Message-ID: CAAKRu_aa-bTWs5Pi6ypZzVOy+-qCJXR7Ja5zDg2oiUvjeA8yYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 23, 2025 at 4:46 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> If we use ParallelBlockTableScanDesc with streaming read like the
> patch did, we would also need to somehow rewind the number of blocks
> allocated to workers. The problem I had with such usage was that a
> parallel vacuum worker allocated a new chunk of blocks when doing
> look-ahead reading and therefore advanced
> ParallelBlockTableScanDescData.phs_nallocated. In this case, even if
> we unpin the remaining buffers in the queue by a new functionality and
> a parallel worker resumes the phase 1 from the last processed block,
> we would lose some blocks in already allocated chunks unless we rewind
> ParallelBlockTableScanDescData and ParallelBlockTableScanWorkerData
> data. However, since a worker might have already allocated multiple
> chunks it would not be easy to rewind these scan state data.

Ah I didn't realize rewinding the state would be difficult. It seems
like the easiest way to make sure those blocks are done is to add them
back to the counter somehow. And I don't suppose there is some way to
save these not yet done block assignments somewhere and give them to
the workers who restart phase I to process on the second pass?

> Another idea is that parallel workers don't exit phase 1 until it
> consumes all pinned buffers in the queue, even if the memory usage of
> TidStore exceeds the limit. It would need to add new functionality to
> the read stream to disable the look-ahead reading. Since we could use
> much memory while processing these buffers, exceeding the memory
> limit, we can trigger this mode when the memory usage of TidStore
> reaches 70% of the limit or so. On the other hand, it means that we
> would not use the streaming read for the blocks in this mode, which is
> not efficient.

That might work. And/or maybe you could start decreasing the size of
block assignment chunks when the memory usage of TidStore reaches a
certain level. I don't know how much that would help or how fiddly it
would be.

> So we would need to
> invent a way to stop and resume the read stream in the middle during
> parallel scan.

As for needing to add new read stream functionality, we actually
probably don't have to. If you use read_stream_end() ->
read_stream_reset(), it resets the distance to 0, so then
read_stream_next_buffer() should just end up unpinning the buffers and
freeing the per buffer data. I think the easiest way to implement this
is to think about it as ending a read stream and starting a new one
next time you start phase I and not as pausing and resuming the read
stream. And anyway, maybe it's better not to keep a bunch of pinned
buffers and allocated memory hanging around while doing what could be
very long index scans.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2025-03-23 17:02:14 Re: Using read_stream in index vacuum
Previous Message Andres Freund 2025-03-23 16:59:43 Re: AIO v2.5