Re: BitmapHeapScan streaming read user and prelim refactoring

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2024-09-27 20:13:35
Message-ID: CAAKRu_YRcZSctDjBkd5MJOR9akNZPnxBa=M2Ds1ZOnYzXAUUPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 19, 2024 at 2:13 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Wed, Jun 19, 2024 at 12:38 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> > > + * XXX I don't understand why we should have this special node if we
> > > + * don't have special nodes for other scan types.
> > >
> > > In this case, up until the final commit (using the read stream
> > > interface), there are six fields required by bitmap heap scan that are
> > > not needed by any other user of HeapScanDescData. There are also
> > > several members of HeapScanDescData that are not needed by bitmap heap
> > > scans and all of the setup in initscan() for those fields is not
> > > required for bitmap heap scans.
> > >
> > > Also, because the BitmapHeapScanDesc needs information like the
> > > ParallelBitmapHeapState and prefetch_maximum (for use in prefetching),
> > > the scan_begin() callback would have to take those as parameters. I
> > > thought adding so much bitmap table scan-specific information to the
> > > generic table scan callbacks was a bad idea.
> > >
> > > Once we add the read stream API code, the number of fields required
> > > for bitmap heap scan that are in the scan descriptor goes down to
> > > three. So, perhaps we could justify having that many bitmap heap
> > > scan-specific fields in the HeapScanDescData.
> > >
> > > Though, I actually think we could start moving toward having
> > > specialized scan descriptors if the requirements for that scan type
> > > are appreciably different. I can't think of new code that would be
> > > added to the HeapScanDescData that would have to be duplicated over to
> > > specialized scan descriptors.
> > >
> > > With the final read stream state, I can see the argument for bloating
> > > the HeapScanDescData with three extra members and avoiding making new
> > > scan descriptors. But, for the intermediate patches which have all of
> > > the bitmap prefetch members pushed down into the HeapScanDescData, I
> > > think it is really not okay. Six members only for bitmap heap scans
> > > and two bitmap-specific members to begin_scan() seems bad.
> > >
> > > What I thought we plan to do is commit the refactoring patches
> > > sometime after the branch for 18 is cut and leave the final read
> > > stream patch uncommitted so we can do performance testing on it. If
> > > you think it is okay to have the six member bloated HeapScanDescData
> > > in master until we push the read stream code, I am okay with removing
> > > the BitmapTableScanDesc and BitmapHeapScanDesc.
> > >
> >
> > I admit I don't have a very good idea what the ideal / desired state
> > look like. My comment is motivated solely by the feeling that it seems
> > strange to have one struct serving most scan types, and then a special
> > struct for one particular scan type ...
>
> I see what you are saying. We could make BitmapTableScanDesc inherit
> from TableScanDescData which would be similar to what we do with other
> things like the executor scan nodes themselves. We would waste space
> and LOC with initializing the unneeded members, but it might seem less
> weird.
>
> Whether we want the specialized scan descriptors at all is probably
> the bigger question, though.
>
> The top level BitmapTableScanDesc is motivated by wanting fewer bitmap
> table scan specific members passed to scan_begin(). And the
> BitmapHeapScanDesc is motivated by this plus wanting to avoid bloating
> the HeapScanDescData.
>
> If you look at at HEAD~1 (with my patches applied) and think you would
> be okay with
> 1) the contents of the BitmapHeapScanDesc being in the HeapScanDescData and
> 2) the extra bitmap table scan-specific parameters in scan_begin_bm()
> being passed to scan_begin()
>
> then I will remove the specialized scan descriptors.
>
> The final state (with the read stream) will still have three bitmap
> heap scan-specific members in the HeapScanDescData.
>
> Would it help if I do a version like this so you can see what it is like?

I revisited this issue (how to keep from bloating the Heap and Table
scan descriptors and adding many parameters to the scan_begin() table
AM callback) and am trying to find a less noisy way to address it than
my previous proposal.

I've attached a prototype of what I think might work applied on top of
master instead of on top of my patchset.

For the top-level TableScanDescData, I suggest we use a union with the
members of each scan type in it in anonymous structs (see 0001). This
will avoid too much bloat because there are other scan types (like TID
Range scans) whose members we can move into the union. It isn't great,
but it avoids a new top-level scan descriptor and changes to the
generic scanstate node.

We will still have to pass the parameters needed to set up the
parallel bitmap iterators to scan_begin() in the intermediate patches,
but if we think that we can actually get the streaming read version of
bitmapheapscan in in the same release, then I think it should be okay
because the final version of these table AM callbacks do not need any
bitmap-specific members.

To address the bloat in the HeapScanDescData, I've kept the
BitmapHeapScanDesc but made it inherit from the HeapScanDescData with
a "suffix" of bitmap scan-specific members which were moved out of the
HeapScanDescData and into the BitmapHeapScanDesc (in 0002).

It's probably better to temporarily increase the parameters to
scan_begin() than to introduce new table AM callbacks and then rip
them out in a later commit.

- Melanie

Attachment Content-Type Size
0002-Move-BitmapHeapScan-specific-members-into-suffix.patch application/x-patch 6.2 KB
0001-Example-use-of-union-in-TableScanDescData.patch application/x-patch 1.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2024-09-27 20:27:38 Re: MAINTAIN privilege -- what do we need to un-revert it?
Previous Message Masahiko Sawada 2024-09-27 20:13:15 Re: Vacuum statistics