Quick Links

Re: Avoiding superfluous buffer locking during nbtree backwards scans

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject:	Re: Avoiding superfluous buffer locking during nbtree backwards scans
Date:	2024-11-07 22:42:47
Message-ID:	CAH2-Wzm=rOY+rMki+vLijZYKMXd5B7QwfqH7_d2WW1PFWKLzsA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Nov 7, 2024 at 2:19 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> It would be weird to depend on the
> currPos state from the last _bt_readpage call, even after seizing the
> scan for the next page in line. If the scan has already finished
> (according to moreLeft/moreRight), then why even try to seize the scan
> once again? Why not just end the scan instead (including its call to
> _bt_parallel_done)?

The problem with what I said here was that v1 didn't go far enough in
this direction. v1 would still needlessly seize the scan whenever
_bt_steppage's blkno was involved. This was still correct, I think --
but it wasn't clear.

Attached revision v2 pushes the fix further in this direction. It
explicitly documents that parallel _bt_readnextpage callers are
allowed to use their so->currPos state (including
blkno/nextPage/prevPage) to help _bt_readnextpage to decide whether it
can end the scan right away. However, parallel index scan callers
still won't be allowed to use this same so->currPos state to decide
what page to go to next -- _bt_readnextpage really will need to seize
the scan to get an authoritative blkno to make that part safe
(actually, one parallel scan _bt_readnextpage caller still seizes the
scan for itself, which is the one caller where _bt_readnextpage fully
trusts caller's blkno + lastcurrblkno).

I think that this approach is a win for readability. It makes the
parallel case closer to the serial case, which is generally a good
thing: it allows _bt_steppage to not care at all about whether or not
the scan is a parallel scan. More importantly, it makes the exact
extent to which parallel scans rely on their so->currPos state a great
deal clearer. That was the underlying problem here all along:
confusion about using the so->currPos state to end the scan vs. using
the so->currPos state to decide which leaf page to visit next. These
are two different things, and only the first one is generally safe.

One problem that my v2 approach created (at least when I first started
work on it) was that the new _bt_readnextpage loop code was
ill-prepared for a blkno that's P_NONE when we seize the scan -- the
part of the loop where that's checked for came first, before the scan
is seized. I could have fixed that new problem by adding an extra
defensive is-blkno-P_NONE code to _bt_readnextpage. But that's not
what I settled on for v2. It makes more sense to just stop allowing
the parallel scan's btps_nextScanPage to become P_NONE in the first
place. The fact that we ever that never made much sense to me -- why
wouldn't we just stop the scan as soon as we reach the point where the
"next block" is P_NONE, which is only possible on the rightmost and
leftmost page? So that's what we do. It's simpler all around.

--
Peter Geoghegan

Attachment	Content-Type	Size
v2-0001-Fix-confusion-with-nbtree-parallel-scan-currPos-s.patch	application/octet-stream	10.1 KB

In response to

Re: Avoiding superfluous buffer locking during nbtree backwards scans at 2024-11-07 19:19:32 from Peter Geoghegan

Responses

Re: Avoiding superfluous buffer locking during nbtree backwards scans at 2024-11-08 13:25:48 from Masahiro Ikeda

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2024-11-07 22:56:01	Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Previous Message	Jacob Champion	2024-11-07 22:29:18	Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible