Re: Use streaming read API in ANALYZE

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, jakub(dot)wartak(at)enterprisedb(dot)com
Subject: Re: Use streaming read API in ANALYZE
Date: 2024-05-15 18:18:12
Message-ID: CAN55FZ31AoypGkAGH_kx4q7J1fVNi-1XNmvdME3Ye6KbCyJDNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, 29 Apr 2024 at 18:41, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Mon, 8 Apr 2024 at 04:21, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> >
> > Pushed. Thanks Bilal and reviewers!
>
> I wanted to discuss what will happen to this patch now that
> 27bc1772fc8 is reverted. I am continuing this thread but I can create
> another thread if you prefer so.

041b96802ef is discussed in the 'Table AM Interface Enhancements'
thread [1]. The main problems discussed about this commit is that the
read stream API is not pushed to the heap-specific code and, because
of that, the other AM implementations need to use read streams. To
push read stream API to the heap-specific code, it is pretty much
required to pass BufferAccessStrategy and BlockSamplerData to the
initscan().

I am sharing the alternative version of this patch. The first patch
just reverts 041b96802ef and the second patch is the alternative
version.

In this alternative version, the read stream API is not pushed to the
heap-specific code, but it is controlled by the heap-specific code.
The SO_USE_READ_STREAMS_IN_ANALYZE flag is introduced and set in the
heap-specific code if the scan type is 'ANALYZE'. This flag is used to
decide whether streaming API in ANALYZE will be used or not. If this
flag is set, this means heap AMs and read stream API will be used. If
it is not set, this means heap AMs will not be used and code falls
back to the version before read streams.

Pros of the alternative version:

* The existing AM implementations other than heap AM can continue to
use their AMs without any change.
* AM implementations other than heap do not need to use read streams.
* Upstream code uses the read stream API and benefits from that.

Cons of the alternative version:

* 6 if cases are added to the acquire_sample_rows() function and 3 of
them are in the while loop.
* Because of these changes, the code looks messy.

Any kind of feedback would be appreciated.

[1] https://www.postgresql.org/message-id/flat/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment Content-Type Size
v13-0001-Revert-Use-streaming-I-O-in-ANALYZE.patch text/x-patch 10.1 KB
v13-0002-Use-streaming-I-O-in-ANALYZE-alternative.patch text/x-patch 10.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-05-15 18:22:58 Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence
Previous Message Josef Šimánek 2024-05-15 18:13:11 Re: [PATCH] Add --syntax to postgres for SQL syntax checking