Re: BitmapHeapScan streaming read user and prelim refactoring

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2025-02-09 14:27:08
Message-ID: bc1d4ebc-3778-4836-ace5-e1daa7223d36@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/30/25 21:36, Melanie Plageman wrote:
>
> ...
>
> Attached v28 is rebased and has a few updates/cleanup. All patches in
> the set need review and I need to do some benchmarking of v28-0003.
>
> - Melanie

Hi,

I've been re-running the benchmarks on v28 on Melanie's request. The
tests are still running, but here's what I have so far. The results are
pretty large, I've pushed them here:

https://github.com/tvondra/bitmapscan-tests

I've decommissioned the small "i5" machine and replaced it with a new
ryzen 9900x one. I kept the "old" storage (SSD SATA RAID), but other
than that it's much more capable. That might explain some of the
differences in benchmark results.

So now I have these two machines:

xeon: 2x E5-2699v4
- nvme: WD Ultrastar DC SN640 960GB

ryzen: Ryzen 9 9900X
- data: Samsung SSD 990 PRO 1TB (NVME)
- raid-nvme: 4x Samsung SSD 990 PRO 1TB (RAID0, NVME)
- raid-sata: 4x Intel DC S3700 (RAID0)

See the PDFs for the usual "colored pivot table" visualization, or the
raw CSV files. The pivot table shows results for various combinations of
parameters, the columns are for different patches and values of the
effective_io_concurrency GUC.

The columns on the right compare results to the previous patch, i.e. it
shows how the query duration changes after applying the patch.

I didn't have time to do a particularly detailed analysis yet, or try to
investigate the regressions. But some basic observations.

1) xeon
-------

This machine only has a single type of storage (NVMe SSD), and the
results looks really good. Most of the chart is "green" i.e. the patch
makes queries faster, and the regressions are random, for the very short
queries, and it seems random / no pattern. Which means this is likely
random noise, nothing to worry about.

There's one minor issue that I failed to test the 0004 patch due to a
bug in the script, so the PDF only compares master, 0001, 0002 and 0003.
But that should not matter too much, because 0003 is the main change.

There are "gaps" for tests on the 100M data set that are still running,
but I think I'll abort those, it'd take many days to complete and it
doesn't seem we'd learn very much.

2) ryzen
--------

This "new" machine has multiple types of storage. The cached results (be
it in shared buffers or in page cache) are not very interesting. 0003
helps a bit (~15%), but other than that it's just random noise.

The "uncached" results starting on page 23 are much more interesting. In
general 0001, 0002 and 0004 have little impact, it seems just random
noise. So in the rest I'll focus on 0003.

For the single nvme device (device: data), it seems mostly fine. It's
green, even though there are a couple "localized regressions" for eic=0.
I haven't looked into those yet.

For the nvme RAID (device: raid-nvme), it's looks almost exactly the
same, except that with parallel query (page 27) there's a clear area of
regression with eic=1 (look for "column" of red cells). That's a bit
unfortunate, because eic=1 is the default value.

For the SATA SSD RAID (device: raid-sata), it's similar - a couple
regressions for eic=0, just like for NVMe. But then there's also a
couple regressions for higher eic values, usually for queries with
selective conditions.

Overall, I think this looks good. It might be a good idea to look into
some of the regressions, but ultimately the read stream relies on a
heuristic to determine how far ahead to read etc. And each heuristics
has counter examples. So the presence of regressions is guaranteed, it
should not be a reason to reject a patch on its own.

There's one more thing to consider when looking at those results - the
script *forces* the use of bitmap index scan, even if the planner would
not pick it on it's own. The ryzen-filtered.pdf shows only results when
that would be the case (i.e. the optimizer would pick bitmapscan). Most
of the regressions are gone, simply because it'd do index scan, seqscan.

Of course, this is not perfect - planning mistakes happen, and it would
be nice to not regress more than before. The plan choice depends also on
the cost parameters, and maybe with different values we'd pick bitmap
scans more often.

Anyway, the results look sensible. It might be good to investigate some
of the regressions, and I'll try doing that if I find the time. But I
don't think that's necessarily a blocker - every patch of this type will
have a hardware where the heuristics doesn't quite do the right thing by
default. Which is why we have GUCs to tune it if appropriate.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-02-09 14:52:49 Re: Thread-safe nl_langinfo() and localeconv()
Previous Message Zhang Mingli 2025-02-09 13:16:53 Re: Proposal to CREATE FOREIGN TABLE LIKE