Quick Links

Re: BitmapHeapScan streaming read user and prelim refactoring

From:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject:	Re: BitmapHeapScan streaming read user and prelim refactoring
Date:	2024-06-19 15:55:20
Message-ID:	CAAKRu_ZRCPyngxn3TZy4082VASQUtDKk-6gHLbWFkLSzSt3X2w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Jun 18, 2024 at 6:02 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> I went through v22 to remind myself of what the patches do and do some
> basic review - I have some simple questions / comments for now, nothing
> major. I've kept the comments in separate 'review' patches, it does not
> seem worth copying here.

Thanks so much for the review!

I've implemented your feedback in attached v23 except for what I
mention below. I have not gone through each patch in the new set very
carefully after making the changes because I think we should resolve
the question of adding the new table scan descriptor before I do that.
A change there will require a big rebase. Then I can go through each
patch very carefully.

From your v22b-0005-review.patch:

src/backend/executor/nodeBitmapHeapscan.c | 14 ++++++++++++++
src/include/access/tableam.h | 2 ++
2 files changed, 16 insertions(+)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c
b/src/backend/executor/nodeBitmapHeapscan.c
index e8b4a754434..6d7ef9ced19 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -270,6 +270,20 @@ new_page:

BitmapAdjustPrefetchIterator(node);

+ /*
+ * XXX I'm a bit unsure if this needs to be handled using
goto. Wouldn't
+ * it be simpler / easier to understand to have two nested loops?
+ *
+ * while (true)
+ * if (!table_scan_bitmap_next_block(...)) { break; }
+ * while (table_scan_bitmap_next_tuple(...)) {
+ * ... process tuples ...
+ * }
+ *
+ * But I haven't tried implementing this.
+ */
if (!table_scan_bitmap_next_block(scan, &node->blockno, &node->recheck,
&node->lossy_pages,
&node->exact_pages))
break;

We need to call table_scan_bimtap_next_block() the first time we call
BitmapHeapNext() on each scan but all subsequent invocations of
BitmapHeapNext() must call table_scan_bitmap_next_tuple() first each
-- because we return from BitmapHeapNext() to yield a tuple even when
there are more tuples on the page. I tried refactoring this a few
different ways and personally found the goto most clear.

From your v22b-0010-review.patch:

@@ -557,6 +559,8 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
table_rescan(node->ss.ss_currentScanDesc, NULL);

/* release bitmaps and buffers if any */
+ /* XXX seems it should not be right after the comment, also shouldn't
+ * we still reset the prefetch_iterator field to NULL? */
tbm_end_iterate(&node->prefetch_iterator);
if (node->tbm)
tbm_free(node->tbm);

prefetch_iterator is a TBMIterator which is stored in the struct (as
opposed to having a pointer to it stored in the struct).
tbm_end_iterate() sets the actual private and shared iterator pointers
to NULL.

From your v22b-0017-review.patch

diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 036ef29e7d5..9c711ce0eb0 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -52,6 +52,13 @@ typedef struct TableScanDescData
} TableScanDescData;
typedef struct TableScanDescData *TableScanDesc;

+/*
+ * XXX I don't understand why we should have this special node if we
+ * don't have special nodes for other scan types.

In this case, up until the final commit (using the read stream
interface), there are six fields required by bitmap heap scan that are
not needed by any other user of HeapScanDescData. There are also
several members of HeapScanDescData that are not needed by bitmap heap
scans and all of the setup in initscan() for those fields is not
required for bitmap heap scans.

Also, because the BitmapHeapScanDesc needs information like the
ParallelBitmapHeapState and prefetch_maximum (for use in prefetching),
the scan_begin() callback would have to take those as parameters. I
thought adding so much bitmap table scan-specific information to the
generic table scan callbacks was a bad idea.

Once we add the read stream API code, the number of fields required
for bitmap heap scan that are in the scan descriptor goes down to
three. So, perhaps we could justify having that many bitmap heap
scan-specific fields in the HeapScanDescData.

Though, I actually think we could start moving toward having
specialized scan descriptors if the requirements for that scan type
are appreciably different. I can't think of new code that would be
added to the HeapScanDescData that would have to be duplicated over to
specialized scan descriptors.

With the final read stream state, I can see the argument for bloating
the HeapScanDescData with three extra members and avoiding making new
scan descriptors. But, for the intermediate patches which have all of
the bitmap prefetch members pushed down into the HeapScanDescData, I
think it is really not okay. Six members only for bitmap heap scans
and two bitmap-specific members to begin_scan() seems bad.

What I thought we plan to do is commit the refactoring patches
sometime after the branch for 18 is cut and leave the final read
stream patch uncommitted so we can do performance testing on it. If
you think it is okay to have the six member bloated HeapScanDescData
in master until we push the read stream code, I am okay with removing
the BitmapTableScanDesc and BitmapHeapScanDesc.

+ * XXX Also, maybe this should do the naming convention with Data at
+ * the end (mostly for consistency).
+ */
typedef struct BitmapTableScanDesc
{
Relation rs_rd; /* heap relation descriptor */

I really want to move away from these Data typedefs. I find them so
confusing as a developer, but it's hard to justify ripping out the
existing ones because of code churn. If we add new scan descriptors, I
had really hoped to start using a different pattern.

From your v22b-0025-review.patch

diff --git a/src/backend/access/table/tableamapi.c
b/src/backend/access/table/tableamapi.c
index a47527d490a..379b7df619e 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -91,6 +91,9 @@ GetTableAmRoutine(Oid amhandler)

Assert(routine->relation_estimate_size != NULL);

+ /* XXX shouldn't this check that _block is not set without _tuple?
+ * Also, the commit message says _block is "local helper" but then
+ * why would it be part of TableAmRoutine? */
Assert(routine->scan_sample_next_block != NULL);
Assert(routine->scan_sample_next_tuple != NULL);

scan_bitmap_next_block() is removed as a table AM callback here, so we
don't check if it is set. We do still check if scan_bitmap_next_tuple
is set if amgetbitmap is not NULL. heapam_scan_bitmap_next_block() is
now a local helper for heapam_scan_bitmap_next_tuple(). Perhaps I
should change the name to something like heap_scan_bitmap_next_block()
to make it clear it is not an implementation of a table AM callback?

- Melanie

Attachment	Content-Type	Size
v23-0005-BitmapHeapScan-initialize-some-prefetch-state-el.patch	text/x-patch	2.4 KB
v23-0002-Remove-table_scan_bitmap_next_tuple-parameter-tb.patch	text/x-patch	4.0 KB
v23-0003-Make-table_scan_bitmap_next_block-async-friendly.patch	text/x-patch	23.9 KB
v23-0001-table_scan_bitmap_next_block-counts-lossy-and-ex.patch	text/x-patch	5.1 KB
v23-0004-Add-common-interface-for-TBMIterators.patch	text/x-patch	17.3 KB
v23-0006-BitmapHeapScan-uses-unified-iterator.patch	text/x-patch	14.3 KB
v23-0009-Remove-ss_current-prefix-from-ss_currentScanDesc.patch	text/x-patch	12.7 KB
v23-0007-BitmapHeapScan-Make-prefetch-sync-error-more-det.patch	text/x-patch	1.1 KB
v23-0010-Add-scan_in_progress-to-BitmapHeapScanState.patch	text/x-patch	2.5 KB
v23-0008-Push-current-scan-descriptor-into-specialized-sc.patch	text/x-patch	15.1 KB
v23-0011-Make-new-bitmap-table-scan-and-bitmap-heap-scan-.patch	text/x-patch	20.1 KB
v23-0012-Update-variable-names-in-bitmap-scan-descriptors.patch	text/x-patch	16.4 KB
v23-0015-BitmapHeapScan-move-per-scan-setup-into-a-helper.patch	text/x-patch	9.6 KB
v23-0013-BitmapHeapScan-lift-and-shift-prefetch-functions.patch	text/x-patch	15.9 KB
v23-0016-Remove-table-AM-callback-scan_bitmap_next_block.patch	text/x-patch	13.4 KB
v23-0017-Hard-code-TBMIterateResult-offsets-array-size.patch	text/x-patch	4.2 KB
v23-0019-Use-streaming-I-O-in-Bitmap-Heap-Scans.patch	text/x-patch	25.4 KB
v23-0018-Separate-TBMIterator-and-TBMIterateResult.patch	text/x-patch	19.5 KB
v23-0014-BitmapHeapScan-Push-prefetch-code-into-heap-AM.patch	text/x-patch	34.5 KB

In response to

Re: BitmapHeapScan streaming read user and prelim refactoring at 2024-06-18 22:02:56 from Tomas Vondra

Responses

Re: BitmapHeapScan streaming read user and prelim refactoring at 2024-06-19 16:38:35 from Tomas Vondra
Re: BitmapHeapScan streaming read user and prelim refactoring at 2024-08-26 13:37:08 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2024-06-19 15:55:56	Re: DOCS: Generated table columns are skipped by logical replication
Previous Message	Tomas Vondra	2024-06-19 15:48:47	Re: suspicious valgrind reports about radixtree/tidstore on arm64