Re: Using read_stream in index vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: Junwang Zhao <zhjwpku(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Using read_stream in index vacuum
Date: 2024-10-23 22:04:36
Message-ID: CAAKRu_bCZJT6yLkExmTmgOa9VK+41jjpi5bVEgsEO2BiNQXZ+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 23, 2024 at 4:29 PM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> > On 23 Oct 2024, at 20:57, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> >
> > I'll think how to restructure flow there...
>
> OK, I've understood how it should be structured. PFA v5. Sorry for the noise.

I think this would be a bit nicer:

while (BufferIsValid(buf = read_stream_next_buffer(stream, NULL)))
{
block = btvacuumpage(&vstate, buf);
if (info->report_progress)
pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE, block);
}

Maybe change btvacuumpage() to return the block number to avoid the
extra BufferGetBlockNumber() calls (those add up).

While looking at this, I started to wonder if it isn't incorrect that
we are not calling pgstat_progress_update_param() for the blocks that
we backtrack and read in btvacuumpage() too (on master as well).
btvacuumpage() may actually have scanned more than one block, so...

Unrelated to code review, but btree index vacuum has the same issues
that kept us from committing streaming heap vacuum last release --
interactions between the buffer access strategy ring buffer size and
the larger reads -- one of which is an increase in the number of WAL
syncs and writes required. Thomas mentions it here [1] and here [2] is
the thread where he proposes adding vectored writeback to address some
of these issues.

- Melanie

[1] https://www.postgresql.org/message-id/CA%2BhUKGKN3oy0bN_3yv8hd78a4%2BM1tJC9z7mD8%2Bf%2ByA%2BGeoFUwQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/CA%2BhUKGK1in4FiWtisXZ%2BJo-cNSbWjmBcPww3w3DBM%2BwhJTABXA%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2024-10-23 22:15:37 Can rs_cindex be < 0 for bitmap heap scans?
Previous Message David Rowley 2024-10-23 22:01:16 Re: Fix typo in tidstore.h