Re: Use streaming read API in ANALYZE

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, jakub(dot)wartak(at)enterprisedb(dot)com, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Use streaming read API in ANALYZE
Date: 2024-04-07 22:26:31
Message-ID: 20240407222631.zew43rr74ajbadq6@liskov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 07, 2024 at 03:00:00PM -0700, Andres Freund wrote:
> Hi,
>
> On 2024-04-07 16:59:26 -0400, Melanie Plageman wrote:
> > From 1dc2343661f3edb3b1bc4307afb0e956397eb76c Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Sun, 7 Apr 2024 14:55:22 -0400
> > Subject: [PATCH v10 1/3] Make heapam_scan_analyze_next_[tuple|block] static.
> >
> > 27bc1772fc81 removed the table AM callbacks scan_analyze_next_block and
> > scan_analzye_next_tuple -- leaving their heap AM implementations only
> > called by acquire_sample_rows().
>
> Ugh, I don't think 27bc1772fc81 makes much sense. But that's unrelated to this
> thread. I did raise that separately
> https://www.postgresql.org/message-id/20240407214001.jgpg5q3yv33ve6y3%40awork3.anarazel.de
>
> Unless I seriously missed something, I see no alternative to reverting that
> commit.

Noted. I'll give up on this refactor then. Lots of churn for no gain.
Attached v11 is just Bilal's v8 patch rebased to apply cleanly and with
a few tweaks (I changed one of the loop conditions. All other changes
are to comments and commit message).

> > @@ -1206,11 +1357,13 @@ acquire_sample_rows(Relation onerel, int elevel,
> > break;
> >
> > prefetch_block = BlockSampler_Next(&prefetch_bs);
> > - PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_block);
> > + PrefetchBuffer(scan->rs_base.rs_rd, MAIN_FORKNUM, prefetch_block);
> > }
> > }
> > #endif
> >
> > + scan->rs_cbuf = InvalidBuffer;
> > +
> > /* Outer loop over blocks to sample */
> > while (BlockSampler_HasMore(&bs))
> > {
>
> I don't think it's good to move a lot of code *and* change how it is
> structured in the same commit. Makes it much harder to actually see changes /
> makes git blame harder to use / etc.

Yep.

> > From 90d115c2401567be65bcf64393a6d3b39286779e Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Sun, 7 Apr 2024 15:28:32 -0400
> > Subject: [PATCH v10 2/3] Use streaming read API in ANALYZE
> >
> > The ANALYZE command prefetches and reads sample blocks chosen by a
> > BlockSampler algorithm. Instead of calling Prefetch|ReadBuffer() for
> > each block, ANALYZE now uses the streaming API introduced in b5a9b18cd0.
> >
> > Author: Nazir Bilal Yavuz
> > Reviewed-by: Melanie Plageman
> > Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
> > ---
> > src/backend/commands/analyze.c | 89 ++++++++++------------------------
> > 1 file changed, 26 insertions(+), 63 deletions(-)
>
> That's a very nice demonstration of how this makes good prefetching easier...

Agreed. Yay streaming read API and Bilal!

> > From 862b7ac81cdafcda7b525e02721da14e46265509 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Sun, 7 Apr 2024 15:38:41 -0400
> > Subject: [PATCH v10 3/3] Obsolete BlockSampler_HasMore()
> >
> > A previous commit stopped using BlockSampler_HasMore() for flow control
> > in acquire_sample_rows(). There seems little use now for
> > BlockSampler_HasMore(). It should be sufficient to return
> > InvalidBlockNumber from BlockSampler_Next() when BlockSample_HasMore()
> > would have returned false. Remove BlockSampler_HasMore().
> >
> > Author: Melanie Plageman, Nazir Bilal Yavuz
> > Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
>
> The justification here seems somewhat odd. Sure, the previous commit stopped
> using BlockSampler_HasMore in acquire_sample_rows - but only because it was
> moved to block_sampling_streaming_read_next()?

It didn't stop using it. It stopped being useful. The reason it existed,
as far as I can tell, was to use it as the while() loop condition in
acquire_sample_rows(). I think it makes much more sense for
BlockSampler_Next() to return InvalidBlockNumber when there are no more
blocks -- not to assert you don't call it when there aren't any more
blocks.

I didn't want to change BlockSampler_Next() in the same commit as the
streaming read user and we can't remove BlockSampler_HasMore() without
changing BlockSampler_Next().

- Melanie

Attachment Content-Type Size
v11-0001-Use-streaming-read-API-in-ANALYZE.patch text/x-diff 7.1 KB
v11-0002-Obsolete-BlockSampler_HasMore.patch text/x-diff 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-04-07 22:28:42 Re: Cluster::restart dumping logs when stop fails
Previous Message Alexander Korotkov 2024-04-07 22:15:06 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands