From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me> |
Subject: | Re: Avoiding superfluous buffer locking during nbtree backwards scans |
Date: | 2024-10-10 17:41:13 |
Message-ID: | CAH2-Wzmq9NHGbx4zuP6Yn4LO2cJK=gQuK318QuXq4QMqmQh4WA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Oct 8, 2024 at 12:52 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> The code in v3-0001-Avoid-unneeded-nbtree-backwards-scan-buffer-locks.patch
> looks reasonable to me. I don't think that there are any impediments
> to committing it sometime this week.
I see significant benefits for parallel index-only backwards scans
with this patch. It's not that hard to see about a 10% decrease in
execution time. I imagine that there are disproportionate benefits for
the parallel case because the work we're avoiding is work that takes
place while a backend has seized the scan. Not bad!
I've been looking at this patch in more detail. I think that you
(Matthias) have gone too far in the direction of making the backwards
scan handling within _bt_readnextpage match the forwards scan
handling. You fail to update blkno when we have to deal with a
concurrent split/deletion that necessitates reading from another blkno
(when the correct left sibling page to read from changes) -- v3 still
used the potentially-stale blkno passed by _bt_readnextpage's caller.
Attached v4 deals with that issue. It also goes further in the
direction of simplifying the code in and around _bt_readnextpage.
The code in this area has always seemed to me to be quite a lot more
complicated than it really needed to be. Now seems like a good time to
refactor and simplify. It's likely that we'll need to do more work
nearby to enable prefetching during plain index scans, without
regressing the kill_prior_tuple/_bt_killitems mechanism -- Tomas
Vondra is currently working on that. The mechanism in question is
closely tied to the stuff that I'm simplifying now, particularly the
_bt_drop_lock_and_maybe_pin/TID-recycle-safety business.
The main simplification new to v4 is that v4 isolates the need to call
_bt_drop_lock_and_maybe_pin to only 2 functions: _bt_readnextpage, and
its new _bt_readfirstpage "sibling" function. The functions have
similar preconditions, and identical postconditions -- all of which
are now clearly documented.
It's not just _bt_drop_lock_and_maybe_pin that's only called from
these 2 functions in v4. All calls to _bt_readpage (plus their
associated PredicateLockPage calls) are now also isolated to those
same 2 functions. This shift enables removing the
_bt_parallel_readpage function entirely -- the point at the top of
_bt_first where _bt_parallel_readpage used to be called can now call
_bt_readnextpage directly instead. ISTM that having a separate
_bt_parallel_readpage function never made much sense -- it was
papering over the fact that the interfaces it uses are so confusing.
v4 also demotes _bt_steppage to a helper function that sets up a call
to _bt_readnextpage. Note that the 2017 parallel index scan commit
split _bt_steppage in two: it became the current _bt_steppage code,
plus the current _bt_readnextpage code. Making that relationship
explicit seems much clearer. I'm essentially finishing off the process
of splitting _bt_steppage up.
Side note: all of these simplifications were enabled by making the
preconditions for calling _bt_readnextpage work independently of the
scan direction (which was present in earlier versions of the patch
from Matthias). We now always do a
"BTScanPosUnpinIfPinned(so->currPos)" within _bt_steppage, so it's
easy to talk clearly about the preconditions/postconditions for
_bt_readnextpage without directly considering its _bt_steppage caller
(still the main caller, including during parallel index scans).
Performance improvement bonus: We do that
"BTScanPosUnpinIfPinned(so->currPos)" at a slightly different point in
_bt_steppage in v4, as compared to master, even during forwards scans:
we consistently unpin *before* the required next call to
_bt_parallel_seize takes place (unless, of course, it's a serial scan,
which'll never need to call _bt_parallel_seize). Presumably this is
why I see a ~5% increase in throughput for parallel index-only forward
(standard) scans with this v4 -- all parallel scans (backwards and
forwards alike) now receive at least some benefit from this patch. It
makes a certain amount of sense; we should do as little as possible
during any window where our backend has seized the scan.
I do need to do more testing, but the direction of v4 seems promising.
--
Peter Geoghegan
Attachment | Content-Type | Size |
---|---|---|
v4-0001-Optimize-and-simplify-nbtree-backwards-scans.patch | application/octet-stream | 33.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-10-10 17:49:02 | Re: Build issue with postgresql 17 undefined reference to `pg_encoding_to_char' and `pg_char_to_encoding' |
Previous Message | Robert Haas | 2024-10-10 16:51:51 | Re: allowing extensions to control planner behavior |