Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
Date: 2023-12-06 19:14:24
Message-ID: CAEze2WiXZ19GvZXAGXycbbRr3PSrQFLdhdponwCs_69Rf3JvnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 6 Dec 2023 at 19:55, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > 1. When scanning an index in ascending order using scankey a > 1 (so,
> > one that defines a start point of the scan), we don't need to check
> > items for consistency with that scankey once we've found the first
> > value that is consistent with the scankey, as all future values will
> > also be consistent with the scankey (if we assume no concurrent page
> > deletions).
>
> BTW, I don't think that page deletion is a concern for these
> optimizations in the way that it is for the similar idea of "dynamic
> prefix compression", which works against insertion-type scan keys
> (used to descend the tree and to do an initial binary search of a leaf
> page).
>
> We already rely on the first call to _bt_readpage (the one that takes
> place from within _bt_first rather than from _bt_next) passing a page
> offset number that's exactly at the start of where matches begin --
> this is crucial in the case of scans with required equality strategy
> scan keys (most scans). If we just skipped the _bt_binsrch and passed
> P_FIRSTDATAKEY(opaque) to _bt_readpage within _bt_first instead, that
> would break lots of queries. So the _bt_binsrch step within _bt_first
> isn't just an optimization -- it's crucial. This is nothing new.

I was thinking more along the lines of page splits+deletions while
we're doing _bt_stepright(), but forgot to consider that we first lock
the right sibling, and only then release the left sibling for splits,
so we should be fine here.

Kind regards,

Matthias van de Meent

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2023-12-06 19:47:25 Re: Emitting JSON to file using COPY TO
Previous Message Daniel Verite 2023-12-06 18:59:11 Re: Emitting JSON to file using COPY TO