Re: Adding skip scan (including MDAM style range skip scan) to nbtree

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Masahiro(dot)Ikeda(at)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Masao(dot)Fujii(at)nttdata(dot)com
Subject: Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Date: 2025-03-22 17:47:41
Message-ID: CAH2-WzmnWZmwJjKBV0XYRGsYsXYqfOhqu4_+gKpQgV8dv=59Pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 21, 2025 at 11:36 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> A big part of the concern here is with the existing pstate.prechecked
> optimization (the one added to Postgres 17 by Alexander Korotkov's
> commit e0b1ee17). It now seems quite redundant -- the new
> _bt_skip_ikeyprefix mechanism added by my 0003-* patch does the same
> thing, but does it better (especially since I taught
> _bt_skip_ikeyprefix to deal with simple inequalities in v29). I now
> think that it makes most sense to totally replace pstate.prechecked
> with _bt_skip_ikeyprefix -- we should use _bt_skip_ikeyprefix during
> every scan (not just during skip scans, not just during scans with
> SAOP array keys), and be done with it.

I just committed "Improve nbtree array primitive scan scheduling".

Attached is v30, which fully replaces the pstate.prechecked
optimization with the new _bt_skip_ikeyprefix optimization (which now
appears in v30-0002-Lower-nbtree-skip-array-maintenance-overhead.patch,
and not in 0003-*, due to my committing the primscan scheduling patch
just now).

I'm now absolutely convinced that fully generalizing
_bt_skip_ikeyprefix (as described in yesterday's email) is the right
direction to take things in. It seems to have no possible downside.

> Under this new scheme, so->scanBehind is strictly a flag that
> indicates that a recheck is scheduled, to be performed once the scan
> calls _bt_readpage for the next page. It no longer serves role #1,
> only role #2. That seems significantly simpler.

I especially like this about the new _bt_skip_ikeyprefix scheme.
Having so->scanBehind strictly be a flag (that tracks if we need a
recheck at the start of reading the next page) substantially lowers
the cognitive burden for somebody trying to understand how the
primitive scan scheduling stuff works.

The newly expanded _bt_skip_ikeyprefix needs quite a bit more testing
and polishing to be committable. I didn't even update the relevant
commit message for v30. Plus I'm not completely sure what to do about
RowCompare keys just yet, which have some funny rules when dealing
with NULLs.

--
Peter Geoghegan

Attachment Content-Type Size
v30-0004-DEBUG-Add-skip-scan-disable-GUCs.patch application/octet-stream 5.4 KB
v30-0003-Apply-low-order-skip-key-in-_bt_first-more-often.patch application/octet-stream 11.6 KB
v30-0002-Lower-nbtree-skip-array-maintenance-overhead.patch application/octet-stream 39.0 KB
v30-0001-Add-nbtree-skip-scan-optimizations.patch application/octet-stream 178.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-03-22 18:02:56 Re: Bug - DoS - Handler function lookups consider non-handler functions
Previous Message David G. Johnston 2025-03-22 17:11:10 Re: Make COPY format extendable: Extract COPY TO format implementations