From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Peter Smith <smithpb2250(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel heap vacuum |
Date: | 2025-03-23 17:01:44 |
Message-ID: | CAAKRu_aa-bTWs5Pi6ypZzVOy+-qCJXR7Ja5zDg2oiUvjeA8yYQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Mar 23, 2025 at 4:46 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> If we use ParallelBlockTableScanDesc with streaming read like the
> patch did, we would also need to somehow rewind the number of blocks
> allocated to workers. The problem I had with such usage was that a
> parallel vacuum worker allocated a new chunk of blocks when doing
> look-ahead reading and therefore advanced
> ParallelBlockTableScanDescData.phs_nallocated. In this case, even if
> we unpin the remaining buffers in the queue by a new functionality and
> a parallel worker resumes the phase 1 from the last processed block,
> we would lose some blocks in already allocated chunks unless we rewind
> ParallelBlockTableScanDescData and ParallelBlockTableScanWorkerData
> data. However, since a worker might have already allocated multiple
> chunks it would not be easy to rewind these scan state data.
Ah I didn't realize rewinding the state would be difficult. It seems
like the easiest way to make sure those blocks are done is to add them
back to the counter somehow. And I don't suppose there is some way to
save these not yet done block assignments somewhere and give them to
the workers who restart phase I to process on the second pass?
> Another idea is that parallel workers don't exit phase 1 until it
> consumes all pinned buffers in the queue, even if the memory usage of
> TidStore exceeds the limit. It would need to add new functionality to
> the read stream to disable the look-ahead reading. Since we could use
> much memory while processing these buffers, exceeding the memory
> limit, we can trigger this mode when the memory usage of TidStore
> reaches 70% of the limit or so. On the other hand, it means that we
> would not use the streaming read for the blocks in this mode, which is
> not efficient.
That might work. And/or maybe you could start decreasing the size of
block assignment chunks when the memory usage of TidStore reaches a
certain level. I don't know how much that would help or how fiddly it
would be.
> So we would need to
> invent a way to stop and resume the read stream in the middle during
> parallel scan.
As for needing to add new read stream functionality, we actually
probably don't have to. If you use read_stream_end() ->
read_stream_reset(), it resets the distance to 0, so then
read_stream_next_buffer() should just end up unpinning the buffers and
freeing the per buffer data. I think the easiest way to implement this
is to think about it as ending a read stream and starting a new one
next time you start phase I and not as pausing and resuming the read
stream. And anyway, maybe it's better not to keep a bunch of pinned
buffers and allocated memory hanging around while doing what could be
very long index scans.
- Melanie
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2025-03-23 17:02:14 | Re: Using read_stream in index vacuum |
Previous Message | Andres Freund | 2025-03-23 16:59:43 | Re: AIO v2.5 |