Quick Links

Re: BitmapHeapScan streaming read user and prelim refactoring

From:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject:	Re: BitmapHeapScan streaming read user and prelim refactoring
Date:	2024-03-01 00:29:45
Message-ID:	CAAKRu_YX6RjjNizWm08H8HdFVCDF9FmNoB5h2TpuvqXftbsALw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Feb 29, 2024 at 5:44 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>
>
> On 2/29/24 22:19, Melanie Plageman wrote:
> > On Thu, Feb 29, 2024 at 7:54 AM Tomas Vondra
> > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>
> >>
> >>
> >> On 2/29/24 00:40, Melanie Plageman wrote:
> >>> On Wed, Feb 28, 2024 at 6:17 PM Tomas Vondra
> >>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2/28/24 21:06, Melanie Plageman wrote:
> >>>>> On Wed, Feb 28, 2024 at 2:23 PM Tomas Vondra
> >>>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>>>>>
> >>>>>> On 2/28/24 15:56, Tomas Vondra wrote:
> >>>>>>>> ...
> >>>>>>>
> >>>>>>> Sure, I can do that. It'll take a couple hours to get the results, I'll
> >>>>>>> share them when I have them.
> >>>>>>>
> >>>>>>
> >>>>>> Here are the results with only patches 0001 - 0012 applied (i.e. without
> >>>>>> the patch introducing the streaming read API, and the patch switching
> >>>>>> the bitmap heap scan to use it).
> >>>>>>
> >>>>>> The changes in performance don't disappear entirely, but the scale is
> >>>>>> certainly much smaller - both in the complete results for all runs, and
> >>>>>> for the "optimal" runs that would actually pick bitmapscan.
> >>>>>
> >>>>> Hmm. I'm trying to think how my refactor could have had this impact.
> >>>>> It seems like all the most notable regressions are with 4 parallel
> >>>>> workers. What do the numeric column labels mean across the top
> >>>>> (2,4,8,16...) -- are they related to "matches"? And if so, what does
> >>>>> that mean?
> >>>>>
> >>>>
> >>>> That's the number of distinct values matched by the query, which should
> >>>> be an approximation of the number of matching rows. The number of
> >>>> distinct values in the data set differs by data set, but for 1M rows
> >>>> it's roughly like this:
> >>>>
> >>>> uniform: 10k
> >>>> linear: 10k
> >>>> cyclic: 100
> >>>>
> >>>> So for example matches=128 means ~1% of rows for uniform/linear, and
> >>>> 100% for cyclic data sets.
> >>>
> >>> Ah, thank you for the explanation. I also looked at your script after
> >>> having sent this email and saw that it is clear in your script what
> >>> "matches" is.
> >>>
> >>>> As for the possible cause, I think it's clear most of the difference
> >>>> comes from the last patch that actually switches bitmap heap scan to the
> >>>> streaming read API. That's mostly expected/understandable, although we
> >>>> probably need to look into the regressions or cases with e_i_c=0.
> >>>
> >>> Right, I'm mostly surprised about the regressions for patches 0001-0012.
> >>>
> >>> Re eic 0: Thomas Munro and I chatted off-list, and you bring up a
> >>> great point about eic 0. In old bitmapheapscan code eic 0 basically
> >>> disabled prefetching but with the streaming read API, it will still
> >>> issue fadvises when eic is 0. That is an easy one line fix. Thomas
> >>> prefers to fix it by always avoiding an fadvise for the last buffer in
> >>> a range before issuing a read (since we are about to read it anyway,
> >>> best not fadvise it too). This will fix eic 0 and also cut one system
> >>> call from each invocation of the streaming read machinery.
> >>>
> >>>> To analyze the 0001-0012 patches, maybe it'd be helpful to run tests for
> >>>> individual patches. I can try doing that tomorrow. It'll have to be a
> >>>> limited set of tests, to reduce the time, but might tell us whether it's
> >>>> due to a single patch or multiple patches.
> >>>
> >>> Yes, tomorrow I planned to start trying to repro some of the "red"
> >>> cases myself. Any one of the commits could cause a slight regression
> >>> but a 3.5x regression is quite surprising, so I might focus on trying
> >>> to repro that locally and then narrow down which patch causes it.
> >>>
> >>> For the non-cached regressions, perhaps the commit to use the correct
> >>> recheck flag (0004) when prefetching could be the culprit. And for the
> >>> cached regressions, my money is on the commit which changes the whole
> >>> control flow of BitmapHeapNext() and the next_block() and next_tuple()
> >>> functions (0010).
> >>>
> >>
> >> I do have some partial results, comparing the patches. I only ran one of
> >> the more affected workloads (cyclic) on the xeon, attached is a PDF
> >> comparing master and the 0001-0014 patches. The percentages are timing
> >> vs. the preceding patch (green - faster, red - slower).
> >
> > Just confirming: the results are for uncached?
> >
>
> Yes, cyclic data set, uncached case. I picked this because it seemed
> like one of the most affected cases. Do you want me to test some other
> cases too?

So, I actually may have found the source of at least part of the
regression with 0010. I was able to reproduce the regression with
patch 0010 applied for the unached case with 4 workers and eic 8 and
100000000 rows for the cyclic dataset. I see it for all number of
matches. The regression went away (for this specific example) when I
moved the BitmapAdjustPrefetchIterator call back up to before the call
to table_scan_bitmap_next_block() like this:

diff --git a/src/backend/executor/nodeBitmapHeapscan.c
b/src/backend/executor/nodeBitmapHeapscan.c
index f7ecc060317..268996bdeea 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -279,6 +279,8 @@ BitmapHeapNext(BitmapHeapScanState *node)
}

new_page:
+ BitmapAdjustPrefetchIterator(node, node->blockno);
+
if (!table_scan_bitmap_next_block(scan, &node->recheck,
&lossy, &node->blockno))
break;

@@ -287,7 +289,6 @@ new_page:
else
node->exact_pages++;

- BitmapAdjustPrefetchIterator(node, node->blockno);
/* Adjust the prefetch target */
BitmapAdjustPrefetchTarget(node);
}

It makes sense this would fix it. I haven't tried all the combinations
you tried. Do you mind running your tests with the new code? I've
pushed it into this branch.
https://github.com/melanieplageman/postgres/commits/bhs_pgsr/

Note that this will fix none of the issues with 0014 because that has
removed all of the old prefetching code anyway.

Thank you sooo much for running these to begin with and then helping
me figure out what is going on!

- Melanie

In response to

Re: BitmapHeapScan streaming read user and prelim refactoring at 2024-02-29 22:44:27 from Tomas Vondra

Responses

Re: BitmapHeapScan streaming read user and prelim refactoring at 2024-03-01 19:31:41 from Melanie Plageman

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Davis	2024-03-01 01:02:51	Re: Pre-proposal: unicode normalized text
Previous Message	Tatsuro Yamada	2024-03-01 00:19:42	Showing applied extended statistics in explain Part 2