Re: Using read_stream in index vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: Junwang Zhao <zhjwpku(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Using read_stream in index vacuum
Date: 2024-10-21 19:34:37
Message-ID: CAAKRu_aMiTqCB3PvUeCT46xSv6tkeKfqcnfpnvzLx4Yj_b+KDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 20, 2024 at 10:19 AM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
>
>
> > On 20 Oct 2024, at 15:16, Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:
> >
> > I'm not sure if I did not express myself correctly, I didn't mean to
> > restart the stream,
> > I mean we can create a new stream for each outer loop, I attached a
> > refactor 0002
> > based on your 0001, correct me if I'm wrong.
>
> I really like how the code looks with this refactoring. But I think we need some input from Thomas.
> Is it OK if we start handful of streams for 1 page at the end of vacuum scan? How costly is to start a new scan?

You shouldn't need either loop in btvacuumscan().

For the inner loop:

for (; scanblkno < num_pages; scanblkno++)
{
Buffer buf = read_stream_next_buffer(stream, NULL);
btvacuumpage(&vstate, buf);
if (info->report_progress)
pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
scanblkno);
}

you should just be able to be do something like

Buffer buf = read_stream_next_buffer(stream, NULL);
if (BufferIsValid(buf))
btvacuumpage(&vstate, buf);

Obviously I am eliding some details and clean-up and such. But your
read stream callback should be responsible for advancing the block
number and thus you shouldn't need to loop like this in
btvacuumscan().

The whole point of the read stream callback provided by the caller is
that the logic to get the next block should be contained there
(read_stream_get_block() is an exception to this).

For the outer loop, I feel like we have options. For example, maybe
the read stream callback can call RelationGetNumberOfBlocks(). I mean
maybe we don't want to have to take a relation extension lock in a
callback.

So, alternatively, we could add some kind of restartable flag to the
read stream API. So, after the callback returns InvalidBlockNumber and
the read_stream_next_buffer() returns NULL, we could call something
like read_stream_update() or read_stream_restart() or something. We
would have updated the BlockRangeReadStreamPrivate->last_exclusive
value. In your case it might not be substantially different
operationally than making new read streams (since you are not
allocating memory for a per-buffer data structure). But, I think the
code would read much better than making new read stream objects in a
loop.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2024-10-21 19:56:02 Re: Using read_stream in index vacuum
Previous Message Alexander Lakhin 2024-10-21 19:00:00 Re: race condition in pg_class