Re: BitmapHeapScan streaming read user and prelim refactoring

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2025-02-21 22:00:49
Message-ID: CAAKRu_Z8nTdGEqNUFGgCaAvRjEv=2RzEt5MKYft9u_BL==cFBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review!

While editing this, I found a few things I could improve. For one, I
actually wasn't using the parallel state member that I had added as a
parameter to one of the table AM callbacks. So, I was able to remove
that.

On Thu, Feb 20, 2025 at 7:32 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> On Fri, Feb 21, 2025 at 11:17 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > [v31-0001-Delay-extraction-of-TIDBitmap-per-page-offsets.patch]
>
> +/*
> + * The tuple OffsetNumbers extracted from a single page in the bitmap.
> + */
> +typedef OffsetNumber TBMOffsets[TBM_MAX_TUPLES_PER_PAGE];
>
> It's used for a stack object, but it's also used as a degraded pointer here:
>
> +extern void tbm_extract_page_tuple(TBMIterateResult *iteritem,
> +
> TBMOffsets offsets);
>
> Maybe a personal style question but degraded pointers aren't my
> favourite C feature. A couple of alternatives, I am not sure how good
> they are: Would this output variable be better wrapped up in a new
> struct for better API clarity? If so, shouldn't such a struct also
> hold ntuples itself, since that really goes with the offsets it's
> holding? If so, could iteritem->ntuples then be replaced with a
> boolean iteritem->lossy, since all you really need to be able to check
> is whether it has offsets to give you, instead of using -1 and -2 for
> that?

Yea, I think this is a good point. I made a new struct which has
ntuples and the offsets array as you suggested. I think it's much
nicer than using -2 :)

I do worry a bit about the ntuples value in the new struct being
uninitialized. Before, tbm_iterate() filled in
TBMIterateResult.ntuples, so you knew that once you had a
TBMIterateResult at all, ntuples was valid. Now you make a
TBMPageOffsets and until you call tbm_extract_page_tuple() on it,
TBMPageOffsets->ntuples is garbage. It's probably okay though. I have
the caller initialize TBMPageOffsets to -1 when I think this might be
an issue.

> Or as a variation without a new struct, would it be better to
> take that output array as a plain pointer to OffsetNumber and
> max_offsets, and return ntuples (ie how many were filled in, cf commit
> f6bef362), for the caller to use in a loop or stash somewhere, with an
> additional comment that TBM_MAX_TUPLES_PER_PAGE is guaranteed to be
> enough, for convenience eg for arrays on the stack?

I thought about doing this, but in the end I decided against this
approach. If we wanted to make it easy to palloc arrays of different
sizes and have tbm_extract_page_tuple extract that many tuples into
the array, we'd have to change more of the code anyway because
tbm_extract_page_tuple is what tells us how many tuples there are to
begin with. We could save this somewhere while filling in the
PagetableEntries initially, but that is a bigger change.

And, passing a length would also mean more callers would have to know
about TBM_MAX_TUPLES_PER_PAGE, which I think is already kind of a hack
since it is defined as MaxHeapTuplesPerPage.

- Melanie

Attachment Content-Type Size
v32-0001-Add-lossy-indicator-to-TBMIterateResult.patch text/x-patch 6.0 KB
v32-0005-Remove-table-AM-callback-scan_bitmap_next_block.patch text/x-patch 23.0 KB
v32-0004-BitmapHeapScan-uses-the-read-stream-API.patch text/x-patch 29.0 KB
v32-0003-Separate-TBM-Shared-Private-Iterator-and-TBMIter.patch text/x-patch 19.5 KB
v32-0002-Delay-extraction-of-TIDBitmap-per-page-offsets.patch text/x-patch 16.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2025-02-21 22:09:42 Re: Statistics Import and Export
Previous Message Robert Haas 2025-02-21 21:44:28 Re: explain analyze rows=%.0f