Re: Adding skip scan (including MDAM style range skip scan) to nbtree

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Masahiro(dot)Ikeda(at)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Masao(dot)Fujii(at)nttdata(dot)com
Subject: Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Date: 2025-04-01 20:16:15
Message-ID: CAEze2WgDec5gu+MQqmhhL0vxMWQT6OGXXCXxJmY3w0O6ifAGSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 1 Apr 2025 at 21:02, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Tue, Apr 1, 2025 at 10:40 AM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > > When nbtree is passed input scan keys derived from a
> > > query predicate "WHERE b = 5", new nbtree preprocessing steps now output
> > > "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys.
> >
> > [...] new nbtree preprocessing steps now output +the equivalent of+ [...]
> >
> > > Preprocessing can freely add a skip array before or after any input
> > > ScalarArrayOp arrays.
> >
> > This implies skip arrays won't be generated for WHERE b = 5 (a
> > non-SAOP scankey) or WHERE b < 3 (not SAOP either), which is probably
> > not intentional, so a rewording would be appreciated.
>
> I don't agree. Yes, that sentence (taken in isolation) does not make
> it 100% unambiguous. But, would anybody ever actually be misled? Even
> once, ever? The misinterpretation of my words that you're concerned
> about is directly contradicted by the whole opening paragraph. Plus,
> it just doesn't make any sense. Obviously, you yourself never actually
> interpreted it that way. Right?

That's built on my knowledge about the internals of this patch ahead
of reading the message, not on a clean-sleet interpretation.

> The paragraph that this sentence appears in is all about the various
> ways in which SAOP arrays and skip arrays are the same thing, except
> at the very lowest level of the _bt_advance_array_keys code. I think
> that that context makes it particularly unlikely that anybody would
> ever think that I mean to imply something about the ways in which
> non-array keys can be composed alongside skip arrays.
>
> I'm pushing back here because I think that there's a real cost to
> using overly defensive language, aimed at some imaginary person.

While I agree that there is such a cost, I don't think that this is
too far fetched. They are not just added when we have SAOP scankeys,
and I think the non-SAOP case is one of the most important areas where
this patch improves performance. Yes, this patch improves performance
for queries with only SAOP on non-first keys, but I've seen more
non-SAOP queries where this patch would improve performance than
similar queries but with SAOP.

> > // nbtpreprocesskeys.c
> >
> > > +static bool _bt_s**parray_shrink
> >
> > I'd like to understand the "shrink" here, as it's not entirely clear to me.
> > The functions are exclusively called through dispatch in
> > _bt_compare_array_scankey_args, and I'd expected that naming to be
> > extended to these functions.
>
> I don't see the problem?

It's mostly as an observation (and not problem per se) that "compare"
(which sounds like a pure function that doens't modify anything, e.g.
_bt_compare) is used to dispatch to "shrink" (which does sound like
it'd modify something).

> > > + * This qual now becomes "WHERE x = ANY('{every possible x value}') and y = 4"
> >
> > I understand what you're going for, but a reference that indexed NULLs
> > are still handled correctly (rather than removed by the filter) would
> > be appreciated, as the indicated substitution would remove NULL
> > values. (This doesn't have to be much more than a footnote.)

> Again, it comes down to what the reader might actually be confused by,
> in the real world. Is it really plausible that I could ever commit a
> skip scan patch that wholly forgot to deal with NULLs sensibly? Would
> you personally ever think that I could make such an obvious blunder in
> a committed patch? And if you did, how long would it take you to
> figure out that there was no such oversight?

My comment is not about your potential future actions, but rather what
any future developer or committer working on this code might think and
worry about when reading this. = ANY {} constructs *always* have NOT
NULL behaviour, just like any other operator clause that isn't "IS
NULL", so clarifying that this is only similar -and does not behave
the same in important edge cases- is IMO important. Not specifically
for you, but for any other developer trying to get a correct
understanding of how this works and why it is correct.

> > > +#if 0
> > > + /* Could be a redundant input scan key, so can't do this: */
> > > + Assert(inkey->sk_strategy == BTEqualStrategyNumber ||
> > > + (inkey->sk_flags & SK_SEARCHNULL));
> > > +#endif
> >
> > I think this should be removed?
>
> Why? This is me expressing that I wish I could write this assertion,
> but it won't quite work. I cannot rule out rare corner-cases involving
> a contradictory pair of input keys, only one of which is a = key (the
> other might be some kind of inequality, which makes this would-be
> assertion not quite work). (You'll see similar "almost assertions"
> from time to time, in different parts of the codebase.)

It is my understanding that those are mostly historical artifacts of
the code base, and not systems in active development. Their rarety
makes it difficult to put numbers on, but IIRC at least one such case
was recently removed for bitrot and apparent lack of use in years.

> > > +#ifdef DEBUG_DISABLE_SKIP_SCAN
> >
> > I noticed this one and only reference to DEBUG_DISABLE_SKIP_SCAN. Are
> > you planning on keeping that around, or is this a leftover?
>
> I deliberately left this in place, just in case somebody wants to see
> what happens when preprocessing stops generating skip arrays entirely.
> Without this, it's not too obvious that it can just be turned off by
> forcing _bt_num_array_keys to return early.

Ah, I see.

> > > prev_numSkipArrayKeys, *numSkipArrayKeys
> >
> > I'm a bit confused why we primarily operate on *numSkipArrayKeys,
> > rather than a local variable that we store in *numSkipArrayKeys once
> > we know we can generate skip keys. I.e., I'd expected something more
> > in line with the following snippet (incomplete code blocks):
> > I think this is easier for the compiler to push the store operation
> > out of the loop (assuming it's optimizable at all; but even if it
> > isn't it removes the load of *numSkipArrayKeys from the hot path).
>
> What if I just had a local copy of numSkipArrayKeys, and copied back
> into caller's arg when the function returns? We'll still need a
> prev_numSkipArrayKeys under this scheme, but we won't have to set the
> caller's pointer until right at the end anymore (which, I agree, seems
> like it might be worth avoiding).

That's a nice alternative too, indeed.

> > I think the increment/decrement callbacks for skipsupport should
> > explicitly check (by e.g. Assert) for NULL (or alternatively: same
> > value) returns on overflow, and the API definition should make this
> > explicit.
>
> But the API definition *does* specifically address the opclass
> author's responsibilities around NULLs? It specifically says that it's
> not up to the opclass author to think about them at all.

Yes. What I'm suggesting is to make the contract enforceable to a
degree. If it was defined to "return (Datum) 0 on overflow", then that
could be Assert()ed; and code that does leak memory could get detected
by static analysis tools in the function scope because the allocation
didn't get returned, but with this definition returning an allocation
is never detected because that's not part of the contract.

> > The current system is quite easy to build a leaking
> > implementation for. Sure, there are other easy ways to break this, but
> > I think it's an easy API modification that makes things just that bit
> > safer.
>
> How can I do that? The callbacks themselves (the callbacks for
> functions that are called as the scan progresses) don't use the fmgr
> interface.

You could Assert(inc_sk_argument == (Datum) 0) in the oflow branch?
I'm certain that (Datum) 0 is an invalid representation of a pointer,
so we know that no allocated value is returned (be it new or
pre-existing).

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-04-01 20:17:05 Re: TEMP_CONFIG vs test_aio
Previous Message Noah Misch 2025-04-01 20:14:03 Re: Small memory fixes for pg_createsubcriber