Re: Avoiding superfluous buffer locking during nbtree backwards scans

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject: Re: Avoiding superfluous buffer locking during nbtree backwards scans
Date: 2024-10-17 21:13:16
Message-ID: CAEze2Wh45vj6gwLQi5NjL9BdgGNDd6UgNkxuedZgHbRpq4W7UQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 16 Oct 2024 at 20:03, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Fri, Oct 11, 2024 at 7:29 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > Attached is v5
>
> Now I'm attaching a v6, which further polishes things. Current plan is to
> commit something very close to this in the next day or two.
>
> v6 is mostly just further comment polishing. But it also merges together
> the two existing _bt_readnextpage loops (for forward and backward scan
> directions) into one single loop that handles everything. This is
> definitely a win for brevity, and might also be a win for clarity --
> but I'm not 100% sure which way I prefer it just yet.

(patch v6)
> - BlockNumber btps_scanPage; /* latest or next page to be scanned */
> + BlockNumber btps_nextScanPage; /* next page to be scanned */
> + BlockNumber btps_lastCurrPage; /* page whose sibling link was copied into
> + * btps_scanPage */

This reference to btps_scanPage in the comment needs to be updated to
its new name.

(from your v5 mail)
> * Now nbtree has only one PredicateLockPage call, inside _bt_readpage.
> This saves us an extra BufferGetBlockNumber call. (v4 pushed
> PredicateLockPage calls down one layer, v5 pushes them down another
> layer still.)
(and seen in patch v6)
> + /* allow next page to be processed by parallel worker */
> + if (scan->parallel_scan)
> + {
> + if (ScanDirectionIsForward(dir))
> + _bt_parallel_release(scan, so->currPos.nextPage,
> + so->currPos.currPage);
> + else
> + _bt_parallel_release(scan, so->currPos.prevPage,
> + so->currPos.currPage);
> + }
> +
> + PredicateLockPage(rel, so->currPos.currPage, scan->xs_snapshot);

I'm a little bit concerned about this change: I'm not familiar with
predicate locks, PLP, or anything related to SSI, but previously we
called PLP in parallel scans while we still had hold of the scan,
while we now call PLP only after letting other backends do things,
allowing PLPs for this scan to happen concurrently with other backends
of this scan. In principle, that looks like an improvement in
parallelism by reducing the work done in the critical path.

However, before, we would always get predicate locks in ascending (or
descending for backward scans) value order, but now that strict
keyspace order access has been released an unfortunately timed
descheduling of the process between _bt_p_release and PLP's guts means
mean lock acquisition would be only approximately in index leaf page
order. If the lock acquisition order matters in the predicate lock
system, then this will likely be a new cause of lock
conflicts/deadlocks.

If that's a real issue (by which I mean predicate locking is indeed
sensitive to acquisition order) then this call to PredicateLockPage
must happen before _bt_p_release, so that users don't get conflicts
caused only by bad timing issues in single-directional index scans.

Apart from these two comments, LGTM.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-10-17 21:15:14 Re: doc: virtual envs with Pl/Python
Previous Message Tom Lane 2024-10-17 21:03:51 Re: doc: virtual envs with Pl/Python