From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | pgsql-committers(at)lists(dot)postgresql(dot)org |
Subject: | pgsql: Optimize nbtree backwards scans. |
Date: | 2024-10-18 15:25:49 |
Message-ID: | E1t1orO-001OKG-56@gemulon.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
Optimize nbtree backwards scans.
Make nbtree backwards scans optimistically access the next page to be
read to the left by following a prevPage block number that's now stashed
in currPos when the leaf page is first read. This approach matches the
one taken during forward scans, which follow a symmetric nextPage block
number from currPos. We stash both a prevPage and a nextPage, since the
scan direction might change (when fetching from a scrollable cursor).
Backwards scans will no longer need to lock the same page twice, except
in rare cases where the scan detects a concurrent page split (or page
deletion). Testing has shown this optimization to be particularly
effective during parallel index-only backwards scans: ~12% reductions in
query execution time are quite possible.
We're much better off being optimistic; concurrent left sibling page
splits are rare in general. It's possible that we'll need to lock more
pages than the pessimistic approach would have, but only when there are
_multiple_ concurrent splits of the left sibling page we now start at.
If there's just a single concurrent left sibling page split, the new
approach to scanning backwards will at least break even relative to the
old one (we'll acquire the same number of leaf page locks as before).
The optimization from this commit has long been contemplated by comments
added by commit 2ed5b87f96, which changed the rules for locking/pinning
during nbtree index scans. The approach that that commit introduced to
leaf level link traversal when scanning forwards is now more or less
applied all the time, regardless of the direction we're scanning in.
Following uniform conventions around sibling link traversal is simpler.
The only real remaining difference between our forward and backwards
handling is that our backwards handling must still detect and recover
from any concurrent left sibling splits (and concurrent page deletions),
as documented in the nbtree README. That is structured as a single,
isolated extra step that takes place in _bt_readnextpage.
Also use this opportunity to further simplify the functions that deal
with reading pages and traversing sibling links on the leaf level, and
to document their preconditions and postconditions (with respect to
things like buffer locks, buffer pins, and seizing the parallel scan).
This enhancement completely supersedes the one recently added by commit
3f44959f.
Author: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Author: Peter Geoghegan <pg(at)bowt(dot)ie>
Discussion: https://postgr.es/m/CAEze2WgpBGRgTTxTWVPXc9+PB6fc1a7t+VyGXHzfnrFXcQVxnA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkBTuFv7W2+84jJT8mWZLXVL0GHq2hMUTn6c9Vw=eYrCw@mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/1bd4bc85cac2b23484a6b568a752de0351c2cc5b
Modified Files
--------------
src/backend/access/nbtree/nbtree.c | 75 ++--
src/backend/access/nbtree/nbtsearch.c | 715 +++++++++++++++-------------------
src/backend/access/nbtree/nbtutils.c | 2 +-
src/include/access/nbtree.h | 62 ++-
4 files changed, 388 insertions(+), 466 deletions(-)
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2024-10-18 17:46:16 | pgsql: Allow pg_set_relation_stats() to set relpages to -1. |
Previous Message | Nathan Bossart | 2024-10-18 15:20:50 | pgsql: Adjust documentation for configuring Linux huge pages. |