Quick Links

Re: BitmapHeapScan streaming read user and prelim refactoring

From:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject:	Re: BitmapHeapScan streaming read user and prelim refactoring
Date:	2024-10-18 20:48:24
Message-ID:	CAAKRu_ZpXuVN9bVxcvMWf+xQr4gacGWmejAPd-fj7cBjDvZcKw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Oct 16, 2024 at 11:09 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>

Thanks for the review!

>
> 1) 0001
>
> I find it a bit weird that we use ntuples to determine if it's exact or
> lossy. Not an issue caused by this patch, of course, but maybe we could
> improve that somehow? I mean this:
>
> if (tbmres->ntuples >= 0)
> (*exact_pages)++;
> else
> (*lossy_pages)++;

I agree. The most straightforward solution would be to add a boolean
`lossy` to TBMIterateResult.

In some ways, I think it would make it a bit more clear because the
situation with TBMIterateResult->recheck is already a bit confusing.
It recheck is true if either 1: the Bitmap is lossy and we cannot use
TBMIterateResult->offsets for this page or 2: the original query has a
qual requiring recheck

In the body of BitmapHeapNext() it says "if we are using lossy info,
we have to recheck the qual conditions at every tuple". And in
heapam_bitmap_next_block(), when the bitmap is lossy it says "bitmap
is lossy, so we must examine each line pointer on the page. But we can
ignore HOT chains, since we'll check each tuple anyway"

I must admit I don't quite understand the connection between a lossy
bitmap and needing to recheck the qual condition. Once we've gone to
the underlying data, why do we need to recheck each tuple unless the
qual requires it? It seems like PageTableEntry->recheck is usually
passed as true for hash indexes and false for btree indexes. So maybe
it has something to do with the index type?

Anyway, adding a `lossy` member can be another patch. I just wanted to
check if that was the solution you were thinking of.

On a somewhat related note, I included a patch to alter the test at
the top of heapam_bitmap_next_tuple()

if (hscan->rs_cindex < 0 || hscan->rs_cindex >= hscan->rs_ntuples)

I don't really see how rs_cindex could ever be < 0 for bitmap heap
scans. But, I'm not confident enough about this to commit the patch
really (obviously tests pass with the Assertion I added -- but still).

> Also, maybe consider manually wrapping long lines - I haven't tried, but
> I guess pgindent did not wrap this. I mean this line, for example:
>
> ... whitespace ... &node->stats.lossy_pages, &node->stats.exact_pages))

Ah, yes. pgindent indeed does not help with this. I have now gone
through every commit and used

git diff origin/master | grep -E '^(\+|diff)' | sed 's/^+//' | expand
-t4 | awk "length > 78 || /^diff/"

from the committing wiki :)

> 2) 0003
>
> The "tidr" name is not particularly clear, IMO. Maybe just "range" would
> be better?

I've updated this.

> + struct
> + {
> + ItemPointerData rs_mintid;
> + ItemPointerData rs_maxtid;
> + } tidr;
>
> Isn't it a bit strange that this patch remove tmbres from
> heapam_scan_bitmap_next_block, when it was already removed from
> table_scan_bitmap_next_tuple in the previous commit? Does it make sense
> to separate it like this? (Maybe it does, not sure.)

I've combined them.

> I find this not very clear:
>
> + * recheck do current page's tuples need recheck
> + * blockno used to validate pf and current block in sync
> + * pfblockno used to validate pf stays ahead of current block
>
> The "blockno" sounds weird - shouldn't it say "are in sync"? Also, not
> clear what "pf" stands for without context (sure, it's "prefetch"). But
> Why not to not write "prefetch_blockno" or something like that?

Fixed.

> 3) 0006
>
> Seems unnecessary to keep this as separate commit, it's just log
> message improvement. I'd merge it to the earlier patch.

Done.

> 4) 0007
>
> Seems fine to me in principle. This adds "subclass" similar to what we
> do for plan nodes, except that states are not derived from Node. But
> it's the same approach.
>
> Why not to do
>
> scan = (HeapScanDesc *) bscan;
>
> instead of
>
> scan = &bscan->rs_heap_base;
>
> I think the simple cast is what we do for the plan nodes, and we even do
> it this way in the opposite direction in a couple places:
>
> BitmapHeapScanDesc *bscan = (BitmapHeapScanDesc *) scan;

Yea, in the reverse case, there is no way to refer to a specific
member (because we are casting something of the parent type to the
child type). But when we are casting from the child type to the parent
type and can actually refer to a member, it seems preferable to
explicitly refer to the member in case anyone ever inserts a member
above rs_heap_base. However, for consistency with other places in the
code, I've changed it as you suggested.

Additionally, I've gone through an made a number of stylistic changes
to conform more closely to existing styles (adding back more rs_
prefixes and typedef'ing BitmapHeapScanDescData * as
BitmapHeapScanDesc [despite how sad it makes me]).

> BTW would it be a good idea for heapam_scan_bitmap_next_block to check
> the passed "scan" has SO_TYPE_BITMAPSCAN? We haven't been checking that
> so far I think, but we only had a single struct ...

Good point. I've done this.

I plan to commit 0002 and 0003 next week. I'm interested if you think
0001 is correct.
I may also commit 0004-0006 as I feel they are ready too.

0007-0011 need a bit more work, I think.

- Melanie

Attachment	Content-Type	Size
v25-0001-Remove-unused-rs_cindex-condition-in-heap-AM-bit.patch	application/octet-stream	1.7 KB
v25-0003-Make-table_scan_bitmap_next_block-async-friendly.patch	application/octet-stream	28.6 KB
v25-0004-Add-common-interface-for-TBMIterators.patch	application/octet-stream	17.0 KB
v25-0005-Bitmap-Table-Scans-use-unified-TBMIterator.patch	application/octet-stream	15.9 KB
v25-0002-table_scan_bitmap_next_block-implementations-cou.patch	application/octet-stream	5.0 KB
v25-0006-Add-and-use-BitmapHeapScanDesc-struct.patch	application/octet-stream	7.7 KB
v25-0010-Remove-table-AM-callback-scan_bitmap_next_block.patch	application/octet-stream	14.7 KB
v25-0008-Push-Bitmap-Table-Scan-prefetch-code-into-heap-A.patch	application/octet-stream	35.7 KB
v25-0009-Move-BitmapTableScan-per-scan-setup-into-a-helpe.patch	application/octet-stream	10.2 KB
v25-0007-Lift-and-Shift-Bitmap-Table-Scan-prefetch-functi.patch	application/octet-stream	15.9 KB
v25-0011-Move-BitmapHeapScanNextBlock-next-to-other-helpe.patch	application/octet-stream	11.9 KB

In response to

Re: BitmapHeapScan streaming read user and prelim refactoring at 2024-10-16 15:09:13 from Tomas Vondra

Responses

Re: BitmapHeapScan streaming read user and prelim refactoring at 2024-12-09 06:22:30 from Dilip Kumar
Re: BitmapHeapScan streaming read user and prelim refactoring at 2024-12-19 09:15:10 from Richard Guo

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Devulapalli, Raghuveer	2024-10-18 21:23:38	RE: Proposal for Updating CRC32C with AVX-512 Algorithm.
Previous Message	Tom Lane	2024-10-18 20:23:43	Re: [PATCH] Add some documentation on how to call internal functions