Re: Avoiding superfluous buffer locking during nbtree backwards scans

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject: Re: Avoiding superfluous buffer locking during nbtree backwards scans
Date: 2024-10-11 23:29:12
Message-ID: CAH2-Wzk003jELmOQ5NCgCqHuE85q_39LFdi0FUvbFAv1KH9ktQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 10, 2024 at 1:41 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> The main simplification new to v4 is that v4 isolates the need to call
> _bt_drop_lock_and_maybe_pin to only 2 functions: _bt_readnextpage, and
> its new _bt_readfirstpage "sibling" function. The functions have
> similar preconditions, and identical postconditions -- all of which
> are now clearly documented.

Worked on this a little more today. Attached is v5, which goes even
further than v4 did. Highlights:

* Completely gets rid of _bt_initialize_more_data by moving its code
into the new _bt_readfirstpage function.

* Adds similar "initialize moreLeft/moreRight" code to the
corresponding point within _bt_readnextpage (that is, at the start of
_bt_readnextpage) -- meaning I moved a bit more code from _bt_steppage
into _bt_readnextpage.

Again, _bt_readnextpage and the new _bt_readfirstpage function are
intended to be "sibling" functions (while _bt_steppage is demoted to a
helper that sets up a call to _bt_readfirstpage). The matching
handling of currPos initialization within _bt_readnextpage and the
_bt_readfirstpage pushes this further than in v4.

* We now reset currPos state (including even its moreLeft/moreRight
fields) within _bt_parallel_seize, automatically and regardless of any
other details.

It doesn't really make sense that (up until now) we've trusted the
shared state that we seized from the parallel scan to kinda be in sync
with the local currPos state (at least its moreLeft/moreRight need to
be in sync, to avoid confusion within _bt_readnextpage). This allowed
me to remove several parallel-only BTScanPosInvalidate calls from
_bt_readnextpage.

This "currPos must be kept in sync with parallel scan state" confusion
is exemplified by the following code snippet (which FWIW was already
removed by earlier revisions of the patch), that appears in the
backwards scan block of _bt_readnextpage on master:

"""
/*
* Should only happen in parallel cases, when some other backend
* advanced the scan.
*/
if (so->currPos.currPage != blkno)
{
BTScanPosUnpinIfPinned(so->currPos);
so->currPos.currPage = blkno;
}
"""

Why should the parallel scan have any concern for what currPos is, in general?

* Now nbtree has only one PredicateLockPage call, inside _bt_readpage.
This saves us an extra BufferGetBlockNumber call. (v4 pushed
PredicateLockPage calls down one layer, v5 pushes them down another
layer still.)

* Improved precondition/postcondition comments, and generally cleaner
separation between _bt_steppage and _bt_readnextpage.

--
Peter Geoghegan

Attachment Content-Type Size
v5-0001-Optimize-and-simplify-nbtree-backwards-scans.patch application/x-patch 41.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2024-10-11 23:33:11 Re: pgsql: Implement pg_wal_replay_wait() stored procedure
Previous Message Benoit Lobréau 2024-10-11 23:16:53 Re: Logging parallel worker draught