From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | pgsql-committers(at)lists(dot)postgresql(dot)org |
Subject: | pgsql: Improve nbtree array primitive scan scheduling. |
Date: | 2025-03-22 17:02:36 |
Message-ID: | E1tw2F1-000UGZ-2N@gemulon.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
Improve nbtree array primitive scan scheduling.
Add a new scheduling heuristic: don't end the ongoing primitive index
scan immediately (at the point where _bt_advance_array_keys notices that
the next set of matching tuples must be on a later page) if the primscan
already managed to step right/left from its first leaf page. Schedule a
recheck against the next sibling leaf page's finaltup instead.
The new heuristic tends to avoid scenarios where the top-level scan
repeatedly starts and ends primitive index scans that each read only one
leaf page from a group of neighboring leaf pages. Affected top-level
scans will now tend to step forward (or backward) through the index
instead, without wasting cycles on descending the index anew.
The recheck mechanism isn't exactly new. But up until now it has only
been used to deal with edge cases involving high key finaltups with one
or more truncated -inf attributes that _bt_advance_array_keys deemed
"provisionally satisfied" (satisfied for the purposes of allowing the
scan to step onto the next page, subject to recheck once on that page).
The mechanism was added by commit 5bf748b8, which invented the general
concept of primitive scan scheduling. It was later enhanced by commit
79fa7b3b, which taught it about cases involving -inf attributes that
satisfy inequality scan keys required in the opposite-to-scan direction
only (arguably, they should have been covered by the earliest version).
Now the recheck mechanism can be applied based on scan-level heuristics,
which have nothing to do with truncated high keys. Now rechecks might
be performed by _bt_readpage when scanning in _either_ scan direction.
The theory behind the new heuristic is that any primitive scan that
makes it past its first leaf page is one that is already likely to have
arrays whose key values match index tuples that are closely clustered
together in the index. The rules that determine whether we ever get
past the first page are still conservative (that'll still only happen
when pstate.finaltup strongly suggests that it's the right thing to do).
Surviving past the first leaf page is a strong signal in itself.
Preparation for an upcoming patch that will add skip scan optimizations
to nbtree. That'll work by adding skip arrays, which behave similarly
to SAOP arrays, but generate their elements procedurally and on-demand.
Note that this commit isn't specifically concerned with skip arrays; the
scheduling logic doesn't (and won't) condition anything on whether the
scan uses skip arrays, SAOP arrays, or some combination of the two
(which seems like a good general principle for _bt_advance_array_keys).
While the problems that this commit ameliorates are more likely with
skip arrays (at least in practice), SAOP arrays (or those with very
dense, contiguous array elements) are also affected.
Author: Peter Geoghegan <pg(at)bowt(dot)ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Discussion: https://postgr.es/m/CAH2-Wzkz0wPe6+02kr+hC+JJNKfGtjGTzpG3CFVTQmKwWNrXNw@mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/9a2e2a285a149490a69a7bd92dd618bb7ca975b3
Modified Files
--------------
src/backend/access/nbtree/nbtsearch.c | 68 ++++++-----
src/backend/access/nbtree/nbtutils.c | 221 ++++++++++++++++++----------------
src/include/access/nbtree.h | 11 +-
3 files changed, 162 insertions(+), 138 deletions(-)
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-03-22 18:17:07 | pgsql: plpgsql: make WHEN OTHERS distinct from WHEN SQLSTATE '00000'. |
Previous Message | Melanie Plageman | 2025-03-21 21:51:57 | pgsql: Use streaming read I/O in SP-GiST vacuuming |