Re: Using read_stream in index vacuum

From: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Junwang Zhao <zhjwpku(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Using read_stream in index vacuum
Date: 2024-10-24 07:15:26
Message-ID: EFEBED92-18D1-4C0F-A4EB-CD47072EF071@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've also added GiST vacuum to the patchset.

> On 24 Oct 2024, at 01:04, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
>
> On Wed, Oct 23, 2024 at 4:29 PM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>
>>> On 23 Oct 2024, at 20:57, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>>
>>> I'll think how to restructure flow there...
>>
>> OK, I've understood how it should be structured. PFA v5. Sorry for the noise.
>
> I think this would be a bit nicer:
>
> while (BufferIsValid(buf = read_stream_next_buffer(stream, NULL)))
> {
> block = btvacuumpage(&vstate, buf);
> if (info->report_progress)
> pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE, block);
> }

OK.

> Maybe change btvacuumpage() to return the block number to avoid the
> extra BufferGetBlockNumber() calls (those add up).
Makes sense... now we have non-obvious function prototype, but I think it worth it.

> While looking at this, I started to wonder if it isn't incorrect that
> we are not calling pgstat_progress_update_param() for the blocks that
> we backtrack and read in btvacuumpage() too (on master as well).
> btvacuumpage() may actually have scanned more than one block, so...
It's correct, user wants to know progress and backtracks do not count towards progress.
Though, it must be relatevely infrequent case.

> Unrelated to code review, but btree index vacuum has the same issues
> that kept us from committing streaming heap vacuum last release --
> interactions between the buffer access strategy ring buffer size and
> the larger reads -- one of which is an increase in the number of WAL
> syncs and writes required. Thomas mentions it here [1] and here [2] is
> the thread where he proposes adding vectored writeback to address some
> of these issues.

Do I need to do anything about it?

Thank you!

Best regards, Andrey Borodin.

Attachment Content-Type Size
v6-0001-Prototype-B-tree-vacuum-streamlineing.patch application/octet-stream 9.2 KB
v6-0002-Use-read_stream-in-GiST-vacuum.patch application/octet-stream 4.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haoran Zhang 2024-10-24 07:15:51 Typo spotted in GetFileBackupMethod comment(?)
Previous Message Tender Wang 2024-10-24 07:01:07 Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.