Re: Use streaming read API in ANALYZE

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, jakub(dot)wartak(at)enterprisedb(dot)com
Subject: Re: Use streaming read API in ANALYZE
Date: 2024-05-20 20:46:35
Message-ID: CAAKRu_ZdDqPkmzi9X0XPn2SJNRMuZEKTX7u9ajJBheiwX38XNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 15, 2024 at 2:18 PM Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> wrote:
>
> On Mon, 29 Apr 2024 at 18:41, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> wrote:
> >
> > On Mon, 8 Apr 2024 at 04:21, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > I wanted to discuss what will happen to this patch now that
> > 27bc1772fc8 is reverted. I am continuing this thread but I can create
> > another thread if you prefer so.
>
> 041b96802ef is discussed in the 'Table AM Interface Enhancements'
> thread [1]. The main problems discussed about this commit is that the
> read stream API is not pushed to the heap-specific code and, because
> of that, the other AM implementations need to use read streams. To
> push read stream API to the heap-specific code, it is pretty much
> required to pass BufferAccessStrategy and BlockSamplerData to the
> initscan().
>
> I am sharing the alternative version of this patch. The first patch
> just reverts 041b96802ef and the second patch is the alternative
> version.
>
> In this alternative version, the read stream API is not pushed to the
> heap-specific code, but it is controlled by the heap-specific code.
> The SO_USE_READ_STREAMS_IN_ANALYZE flag is introduced and set in the
> heap-specific code if the scan type is 'ANALYZE'. This flag is used to
> decide whether streaming API in ANALYZE will be used or not. If this
> flag is set, this means heap AMs and read stream API will be used. If
> it is not set, this means heap AMs will not be used and code falls
> back to the version before read streams.

Personally, I think the alternative version here is the best option
other than leaving what is in master. However, I would vote for
keeping what is in master because 1) where we are in the release
timeline and 2) the acquire_sample_rows() code, before streaming read,
was totally block-based anyway.

If we kept what was in master, do we need to document for table AMs
how to use read_stream_next_buffer() or can we assume they will look
at the heap AM implementation?

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Akshat Jaimini 2024-05-20 20:59:00 Re: commitfest.postgresql.org is no longer fit for purpose
Previous Message Dagfinn Ilmari Mannsåker 2024-05-20 20:39:34 Re: Cleaning up perl code