Re: Use streaming read API in ANALYZE

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, jakub(dot)wartak(at)enterprisedb(dot)com
Subject: Re: Use streaming read API in ANALYZE
Date: 2024-04-08 03:46:42
Message-ID: CA+hUKG+88-GgGmo1o3iwwu-u8YfNGMA1chZJ+NsmTshHFAgxEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 8, 2024 at 10:26 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> On Sun, Apr 07, 2024 at 03:00:00PM -0700, Andres Freund wrote:
> > On 2024-04-07 16:59:26 -0400, Melanie Plageman wrote:
> > > From 862b7ac81cdafcda7b525e02721da14e46265509 Mon Sep 17 00:00:00 2001
> > > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > > Date: Sun, 7 Apr 2024 15:38:41 -0400
> > > Subject: [PATCH v10 3/3] Obsolete BlockSampler_HasMore()
> > >
> > > A previous commit stopped using BlockSampler_HasMore() for flow control
> > > in acquire_sample_rows(). There seems little use now for
> > > BlockSampler_HasMore(). It should be sufficient to return
> > > InvalidBlockNumber from BlockSampler_Next() when BlockSample_HasMore()
> > > would have returned false. Remove BlockSampler_HasMore().
> > >
> > > Author: Melanie Plageman, Nazir Bilal Yavuz
> > > Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
> >
> > The justification here seems somewhat odd. Sure, the previous commit stopped
> > using BlockSampler_HasMore in acquire_sample_rows - but only because it was
> > moved to block_sampling_streaming_read_next()?
>
> It didn't stop using it. It stopped being useful. The reason it existed,
> as far as I can tell, was to use it as the while() loop condition in
> acquire_sample_rows(). I think it makes much more sense for
> BlockSampler_Next() to return InvalidBlockNumber when there are no more
> blocks -- not to assert you don't call it when there aren't any more
> blocks.
>
> I didn't want to change BlockSampler_Next() in the same commit as the
> streaming read user and we can't remove BlockSampler_HasMore() without
> changing BlockSampler_Next().

I agree that the code looks useless if one condition implies the
other, but isn't it good to keep that cross-check, perhaps
reformulated as an assertion? I didn't look too hard at the maths, I
just saw the words "It is not obvious that this code matches Knuth's
Algorithm S ..." and realised I'm not sure I have time to develop a
good opinion about this today. So I'll leave the 0002 change out for
now, as it's a tidy-up that can easily be applied in the next cycle.

Attachment Content-Type Size
v12-0001-Remove-obsolete-BlockSampler_HasMore.patch text/x-patch 2.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-04-08 04:00:00 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Previous Message David Rowley 2024-04-08 03:43:12 Re: Improve heapgetpage() performance, overhead from serializable