From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Naive handling of inequalities by nbtree initial positioning code |
Date: | 2023-08-14 04:33:30 |
Message-ID: | CAH2-WznFKe-V9+LALLfTXS63sf4_1ja9tHhUL4J6wj6E=sdgxw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Aug 13, 2023 at 5:50 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> All that it would take to fix the problem is per-attribute
> BTScanInsertData.nextkey values. There is no reason why "nextkey"
> semantics should only work for the last attribute in the insertion
> scan key. Under this scheme, _bt_first() would be taught to set up the
> insertion scan key with (say) nextkey=true for the "four > 2"
> attribute, and nextkey=false for the other 3 attributes (since we do
> that whenever >= or = are used). It would probably also make sense to
> generalize this approach to handle (say) a third query that had a
> "four < 2" inequality, but otherwise matched the first two queries. So
> we wouldn't literally use multiple "nextkey" fields to do this.
Actually, that can't work when there are a huge number of index tuples
with the same values for "four" (enough to span many internal pages).
So we'd need specialized knowledge of the data type (probably
from an opclass support function) to transform "four > 2" into "four
>= 3" up front. Alternatively, we could do roughly the same thing via
an initial index probe to do the same thing. The latter approach would
be needed for continuous data types, where the transformation isn't
possible at all.
The probing approach could work by finding an initial position in the
same way as we currently locate an initial leaf page -- the way that I
complained about earlier on, but with an important twist. Once we'd
established that the first "four" value in the index > 2 really was 3
(or whatever it turned out to be), we could fill that value into a new
insertion scan key. It would then be possible to do another descent of
the index, skipping over most of the leaf pages that we'll access
needlessly right now. (Actually, we'd only do all that when it seemed
likely to allow us to skip a significant number of intermediate leaf
pages -- which is what we saw in my test case.)
This is directly related to skip scan. The former approach is more or
less what the MDAM paper calls "dense" access (which is naturally
limited to discrete data types like integer), while the latter probing
approach is what it calls "sparse" access. Skip scan performs this
process repeatedly, most of the time, but we'd only skip once here.
In fact, if my example had used (say) "four > 1" instead, then it
would have made sense to skip multiple times -- not just once, after
an initial descent. Because then we'd have had to consider matches for
both "two=1 and four=2" and "two=1 and four=3" (there aren't any
"two=1 and four=4" matches so we'd then be done).
In fact, had there been no mention of the "four" column in the query
whatsoever (which is how we tend to think of skip scan), then a decent
implementation of skip scan would effectively behave as if the query
had been written "two=1 and four > -inf and ...", while following the
same general approach. (Or "two=1 and four < +inf and ...", if this
was a similar looking backwards scan.)
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2023-08-14 04:37:14 | Re: WIP: new system catalog pg_wait_event |
Previous Message | Pavel Stehule | 2023-08-14 03:51:57 | proposal: jsonb_populate_array |