Re: Use streaming read API in pgstattuple.

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use streaming read API in pgstattuple.
Date: 2024-12-02 08:01:26
Message-ID: CAN55FZ1qiQ7L6uRvyELimbB40o08iH1HnB+zmD9+K6EFH5exSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, 29 Nov 2024 at 20:05, Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
>
> Hi,
>
>
> On 29/11/24 04:28, Nazir Bilal Yavuz wrote:
> > - for (; blkno < nblocks; blkno++)
> > + p.last_exclusive = nblocks;
> > +
> > + while (BufferIsValid(buf = read_stream_next_buffer(stream, NULL)))
> > {
> > CHECK_FOR_INTERRUPTS();
> >
> > - pagefn(&stat, rel, blkno, bstrategy);
> > + pagefn(&stat, rel, buf);
> > }
> > +
> > + Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
> >
> > With this change we assume that if stream returns an invalid buffer
> > that means stream must be finished. We don't check if the stream
> > didn't finish but somehow read an invalid buffer. In the upstream
> > version, if we read an invalid buffer then postgres server will fail.
> > But in the patched version, the server will continue to run because it
> > will think that stream has reached to end. This could be happening for
> > other streaming read users; for example at least the
> > read_stream_next_buffer() calls in the collect_corrupt_items()
> > function face the same issue.
>
> Just for reference; On pg_prewarm() for example this loop is
> implemented as:
>
> for (block = first_block; block <= last_block; ++block)
> {
> Buffer buf;
> ...
> buf = read_stream_next_buffer(stream, NULL);
> ...
> }
> Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
>
>
> Would this approach make sense on these cases? (this patch and on
> collect_corrupt_items)

Yes, that should work and I guess it would be easy to apply this
approach to this patch but it would be hard to apply this to
collect_corrupt_items().

In cases like pg_prewarm or the current scenario, we know the exact
number of blocks to read, which allows us to determine when the stream
should finish (e.g., after the loop runs blockno times) and return an
invalid buffer. But in collect_corrupt_items(), the number of loop
iterations required is not directly tied to blockno since some blocks
may be skipped. This makes it difficult to know when the stream should
terminate and return an invalid buffer.

--
Regards,
Nazir Bilal Yavuz
Microsoft

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2024-12-02 08:27:18 Re: Vacuum statistics
Previous Message Masahiko Sawada 2024-12-02 08:00:49 Re: UUID v7