Quick Links

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, benoit <benoit(at)hopsandfork(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject:	Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Date:	2024-03-07 20:34:52
Message-ID:	CAH2-WznV4Pzp8Xf7M9tu50axRVjLR7ERSYDNK0CcxXN5CxxdkQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Mar 6, 2024 at 4:46 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> On Wed, 6 Mar 2024 at 01:50, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > I think that there is supposed to be a closing parenthesis here? So
> > "... (such as those described in <functions-comparisons>") might
> > perform...".
>
> Correct.
>
> > FWM, if that's what you meant.
>
> WFM, yes?

Then we're in agreement on this.

> > At one point Heikki suggested that I just get rid of
> > BTScanOpaqueData.arrayKeyData (which has been there for as long as
> > nbtree had native support for SAOPs), and use
> > BTScanOpaqueData.keyData exclusively instead. I've finally got around
> > to doing that now.
>
> I'm not sure if it was worth the reduced churn when the changes for
> the primary optimization are already 150+kB in size; every "small"
> addition increases the context needed to review the patch, and it's
> already quite complex.

I agree that the patch is quite complex, especially relative to its size.

> > Note that we no longer need to have an independent representation of
> > so->qual_okay for array keys (the convention of setting
> > so->numArrayKeys to -1 for unsatisfiable array keys is no longer
> > required). There is no longer any need for a separate pass to carry
> > over the contents of BTScanOpaqueData.arrayKeyData to
> > BTScanOpaqueData.keyData, which was confusing.
>
> I wasn't very confused by it, but sure.
>
> > Are you still interested in working directly on the preprocessing
> > stuff?
>
> If you mean my proposed change to merging two equality AOPs, then yes.
> I'll try to fit it in tomorrow with the rest of the review.

I didn't just mean that stuff. I was also suggesting that you could
join the project directly (not just as a reviewer). If you're
interested, you could do general work on the preprocessing of arrays.
Fancier array-specific preprocessing.

For example, something that can transform this: select * from foo
where a in (1,2,3) and a < 3;

Into this: select * from foo where a in (1,2);

Or, something that can just set qual_okay=false, given a query such
as: select * from foo where a in (1,2,3) and a < 5;

This is clearly doable by reusing the binary search code during
preprocessing. The first example transformation could work via a
binary search for the constant "3", followed by a memmove() to shrink
the array in-place (plus the inequality itself would need to be fully
eliminated). The second example would work a little like the array
merging thing that the patch has already, except that there'd only be
one array involved (there wouldn't be a pair of arrays).

> > select * from multi_test where a in (1,99, 182, 183, 184) and a < 183;
> >
> > Maybe we need to do better with that. What do you think?
>
> Let me come back to that when I'm done reviewing the final part of nbtutils.

Even if this case doesn't matter (which I now doubt), it's probably
easier to just get it right than it would be to figure out if we can
live without it.

> > void
> > _bt_start_array_keys(IndexScanDesc scan, ScanDirection dir)
> > @@ -519,159 +888,1163 @@ _bt_start_array_keys(IndexScanDesc scan, ScanDirection dir)
> > BTScanOpaque so = (BTScanOpaque) scan->opaque;
> > int i;
> >
> > + Assert(so->numArrayKeys);
> > + Assert(so->qual_ok);
>
> Has the requirement for a known scan direction been removed, or should
> this still have an Assert(dir != NoMovementScanDirection)?

I agree that such an assertion is worth having. Added that locally.

> I'm not sure that comment is correct, at least it isn't as clear as
> can be. Maybe something more in the line of the following?
> + pstate.prechecked = false; /* prechecked didn't cover HIKEY */

I agree that that's a little better than what I had.

> +++ b/src/backend/access/nbtree/nbtutils.c
> > @@ -272,7 +319,32 @@ _bt_preprocess_array_keys(IndexScanDesc scan)
> > + elemtype = cur->sk_subtype;
> > + if (elemtype == InvalidOid)
> > + elemtype = rel->rd_opcintype[cur->sk_attno - 1];
>
> Should we Assert() that this elemtype matches the array element type
> ARR_ELEMTYPE(arrayval) used to deconstruct the array?

Yeah, good idea.

> This is not "a" previous equality key, but "the" previous equality
> operator array scan key.
> Do we want to expend some additional cycles for detecting duplicate
> equality array key types in interleaved order like =int[] =bigint[]
> =int[]? I don't think it would be very expensive considering the
> limited number of cross-type equality operators registered in default
> PostgreSQL, so a simple loop that checks matching element types
> starting at the first array key scankey for this attribute should
> suffice. We could even sort the keys by element type if we wanted to
> fix any O(n) issues for this type of behaviour (though this is
> _extremely_ unlikely to become a performance issue).

Yes, I think that we should probably have this (though likely wouldn't
bother sorting the scan keys themselves).

You'd need to look-up cross-type operators for this. They could
possibly fail, even when nothing else fails, because in principle you
could have 3 types involved for only 2 scan keys: the opclass/on-disk
type, plus 2 separate types. We could fail to find a cross-type
operator for our "2 separate types", while still succeeding in finding
2 cross type operators for each of the 2 separate scan keys (each
paired up the indexed column).

When somebody writes a silly query with such obvious redundancies,
there needs to be a sense of proportion about it. Fixed preprocessing
costs don't seem that important. What's important is that we make a
reasonable effort to avoid horrible runtime performance when it can be
avoided, such as scanning the whole index. (If a user has a complaint
about added cycles during preprocessing, then the fix for that is to
just not write silly queries. I care much less about added cycles if
they have only a fixed, small-ish added cost, that isn't borne by
sensibly-written queries.)

> > + * qual is unsatisfiable
> > + */
> > + if (num_elems == 0)
> > + {
> > + so->qual_ok = false;
> > + return NULL;
> > + }
>
> I think there's a memory context switch back to oldContext missing
> here, as well as above at `if (num_nonnulls == 0)`. These were
> probably introduced by changing from break to return; and the paths
> aren't yet exercised in regression tests.

I agree that this is buggy. But it doesn't seem to actually fail in
practice (it "accidentally fails to fail"). In practice, the whole
index scan is shut down before anything breaks anyway.

I mention this only because it's worth understanding that I do have
coverage for this case (at least in my own expansive test suite). It
just wasn't enough to detect this particular oversight.

> > +++ b/src/backend/utils/adt/selfuncs.c
> has various changes in comments that result in spelling and style issues:
>
> > + * primitive index scans that will be performed for caller
> + * primitive index scans that will be performed for the caller.
> (missing "the", period)

I'll just keep the previous wording and punctuation, with one
difference: "index scans" will be changed to "primitive index scans".

> > - * share for each outer scan. (Don't pro-rate for ScalarArrayOpExpr,
> > - * since that's internal to the indexscan.)
> > + * share for each outer scan
>
> The trailing period was lost:
> + * share for each outer scan.

I'm not in the habit of including a period/full stop after a
standalone sentence (actually, I'm in the habit of *not* doing so,
specifically). But in the interest of consistency with surrounding
code (and because it doesn't really matter), I'll add one back here.

--
Peter Geoghegan

In response to

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan at 2024-03-06 21:46:00 from Matthias van de Meent

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2024-03-07 21:07:28	Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Previous Message	Alena Rybakina	2024-03-07 20:28:59	Re: POC, WIP: OR-clause support for indexes