Re: Adding skip scan (including MDAM style range skip scan) to nbtree

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Masahiro(dot)Ikeda(at)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Masao(dot)Fujii(at)nttdata(dot)com
Subject: Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Date: 2025-01-23 22:34:17
Message-ID: CAH2-Wz=-i0QuNaNGJRjuLx7HDSgRWXcPo22ChX+2tTCMidxM_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 13, 2025 at 3:22 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Attached is v21. This revision is just to fix bitrot against HEAD that
> was caused by recent commits of mine -- all of which were related to
> nbtree preprocessing.

Attached is v22.

This revision changes the optimizer's cost model, making its costing
exactly match the costing on the master branch in marginal cases --
cases where some skipping may be possible, but not enough to justify
*expecting* any saving during query planning.

Perhaps surprisingly, skipping only every second leaf page can be
5x-7x faster, even though the number of "buffers hit" could easily be
~3x higher due to all of the extra internal page reads. But it's
really hard to predict exactly how much we'll benefit from skipping
during planning, within btcostestimate. The costing is of course
driven by statistics, and estimating the cardinality of multiple
columns together with those statistics is bound to be quite inaccurate
much of the time. We should err in the direction of assuming a
relatively expensive full index scan (we always did, but now we do so
even more).

As a result of these improvements to the costing, v22 is the first
version without any changes to EXPLAIN output/query plans in expected
regression test output. That wasn't what I set out to do (I actually
set out to fix clearly nonsensical costing in certain edge cases), but
not having any plan changes in the regression tests does seem like a
good thing to me.

v22 also simplifies a number of things on the nbtree side:

* We no longer keep around a possibly-cross-type ORDER proc in skip
arrays (just the original scan key). We give up on a marginal
optimization that was used only during skip scans involving a skipped
column with a range containing either >= or <= inequalities for a
type/opclass that lacks skip support.

In v22, nbtree scans can no longer determine that a tuple "(a, b) =
('foo', 342)" with a qual "WHERE a <= 'foo' AND b = 342" doesn't have
to continue once it reaches the first tuple > '(foo, 342)' when "a" is
of a type that doesn't offer skip support, such as text (if "a" is of
a type like integer then we still get this behavior, without any of
the complexity). The downside of ripping this optimization out is that
there might now be an extra primitive index scan that finds the next
"a" value is > 'foo' before we can actually terminate the scan --
we'll now fail to notice that the existing skip array element is
'foo', so the next one in the index cannot possibly be greater than
'foo'. The upside is that it makes things simpler, and avoids extra
comparisons during scans that might not pay for themselves.

This optimization wasn't adding very much, and didn't seem to justify
the complexity that it imposed during preprocessing. Keeping around
extra ORDER procs had problems in cases that happened to involve
cross-type operators. I'm pretty sure that they were broken in the
presence of a redundant/duplicative inequality that couldn't be proven
to be safe to eliminate by preprocessing. I probably could have fixed
the problem instead, but it seemed better to just cut scope.

* The optimization that has nbtree preprocessing convert "WHERE a > 5
AND b = 9000" into "WHERE b >= 6 and b = 9000" where possible (i.e. in
cases involving a skip array that offers skip support) has been broken
out into its own commit/patch -- that's now in 0004-*.

It's possible that I'll ultimately conclude that this optimization
isn't worth the complexity, either -- and then rip it out as well. But
it isn't all that complicated, and only imposes overhead during
preprocessing (never during the scan proper), so I still lean towards
committing it. But it's certainly not essential.

* Reorders and renames the new functions in nbtutils.c and in
nbtpreprocesskeys.c for clarity.

* Polishes and slightly refactors array preprocessing, to make it
easier to understand the rules that determine when and how
preprocessing generates skip arrays.

--
Peter Geoghegan

Attachment Content-Type Size
v22-0005-DEBUG-Add-skip-scan-disable-GUCs.patch application/octet-stream 4.4 KB
v22-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch application/octet-stream 52.8 KB
v22-0003-Lower-the-overhead-of-nbtree-runtime-skip-checks.patch application/octet-stream 23.1 KB
v22-0002-Add-nbtree-skip-scan-optimizations.patch application/octet-stream 157.2 KB
v22-0004-Convert-nbtree-inequalities-using-skip-support.patch application/octet-stream 8.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-01-23 22:34:38 Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
Previous Message Tom Lane 2025-01-23 21:59:53 Re: Self contradictory examining on rel's baserestrictinfo