From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, benoit <benoit(at)hopsandfork(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Subject: | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |
Date: | 2024-03-06 21:46:00 |
Message-ID: | CAEze2WhHixewkTRLhGMoMPOGmXNJBLnVNt7FuxgFpmzdricR0A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 6 Mar 2024 at 01:50, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Mon, Mar 4, 2024 at 3:51 PM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > > + that searches for multiple values together. Queries that use certain
> > > + <acronym>SQL</acronym> constructs to search for rows matching any value
> > > + out of a list (or an array) of multiple scalar values might perform
> > > + multiple <quote>primitive</quote> index scans (up to one primitive scan
> > > + per scalar value) at runtime. See <xref linkend="functions-comparisons"/>
> > > + for details.
> >
> > I don't think the "see <functions-comparisons> for details" is
> > correctly worded: The surrounding text implies that it would contain
> > details about in which precise situations multiple primitive index
> > scans would be consumed, while it only contains documentation about
> > IN/NOT IN/ANY/ALL/SOME.
> >
> > Something like the following would fit better IMO:
> >
> > + that searches for multiple values together. Queries that use certain
> > + <acronym>SQL</acronym> constructs to search for rows matching any value
> > + out of a list or array of multiple scalar values (such as those
> > described in
> > + <functions-comparisons> might perform multiple <quote>primitive</quote>
> > + index scans (up to one primitive scan per scalar value) at runtime.
>
> I think that there is supposed to be a closing parenthesis here? So
> "... (such as those described in <functions-comparisons>") might
> perform...".
Correct.
> FWM, if that's what you meant.
WFM, yes?
> > Then there is a second issue in the paragraph: Inverted indexes such
> > as GIN might well decide to start counting more than one "primitive
> > scan" per scalar value,
[...]
> I've described the issues in this area (in the docs) in a way that is
> most consistent with historical conventions. That seems to have the
> fewest problems, despite everything I've said about it.
Clear enough, thank you for explaining your thoughts on this.
> > > > All that really remains now is to research how we might integrate this
> > > > work with the recently added continuescanPrechecked/haveFirstMatch
> > > > stuff from Alexander Korotkov, if at all.
> > >
> > > The main change in v12 is that I've integrated both the
> > > continuescanPrechecked and the haveFirstMatch optimizations. Both of
> > > these fields are now page-level state, shared between the _bt_readpage
> > > caller and the _bt_checkkeys/_bt_advance_array_keys callees (so they
> > > appear next to the new home for _bt_checkkeys' continuescan field, in
> > > the new page state struct).
> >
> > Cool. I'm planning to review the rest of this patch this
> > week/tomorrow, could you take some time to review some of my btree
> > patches, too?
>
> Okay, I'll take a look again.
Thanks, greatly appreciated.
> At one point Heikki suggested that I just get rid of
> BTScanOpaqueData.arrayKeyData (which has been there for as long as
> nbtree had native support for SAOPs), and use
> BTScanOpaqueData.keyData exclusively instead. I've finally got around
> to doing that now.
I'm not sure if it was worth the reduced churn when the changes for
the primary optimization are already 150+kB in size; every "small"
addition increases the context needed to review the patch, and it's
already quite complex.
> These simplifications were enabled by my new approach within
> _bt_preprocess_keys, described when I posted v12. v13 goes even
> further than v12 did, by demoting _bt_preprocess_array_keys to a
> helper function for _bt_preprocess_keys. That means that we do all of
> our scan key preprocessing at the same time, at the start of _bt_first
> (though only during the first _bt_first, or to be more precise during
> the first per btrescan). If we need fancier
> preprocessing/normalization for arrays, then it ought to be a lot
> easier with this structure.
Agreed.
> Note that we no longer need to have an independent representation of
> so->qual_okay for array keys (the convention of setting
> so->numArrayKeys to -1 for unsatisfiable array keys is no longer
> required). There is no longer any need for a separate pass to carry
> over the contents of BTScanOpaqueData.arrayKeyData to
> BTScanOpaqueData.keyData, which was confusing.
I wasn't very confused by it, but sure.
> Are you still interested in working directly on the preprocessing
> stuff?
If you mean my proposed change to merging two equality AOPs, then yes.
I'll try to fit it in tomorrow with the rest of the review.
> I have a feeling that I was slightly too optimistic about how
> likely we were to be able to get away with not having certain kinds of
> array preprocessing, back when I posted v12. It's true that the
> propensity of the patch to not recognize "partial
> redundancies/contradictions" hardly matters with redundant equalities,
> but inequalities are another matter. I'm slightly worried about cases
> like this one:
>
> select * from multi_test where a in (1,99, 182, 183, 184) and a < 183;
>
> Maybe we need to do better with that. What do you think?
Let me come back to that when I'm done reviewing the final part of nbtutils.
> Attached is v13.
It looks like there are few issues remaining outside the changes in
nbtutils. I've reviewed the changes to those files, and ~half of
nbtutils (up to _bt_advance_array_keys_increment) now. I think I can
get the remainder done tomorrow.
> +++ b/src/backend/access/nbtree/nbtutils.c
> /*
> * _bt_start_array_keys() -- Initialize array keys at start of a scan
> *
> * Set up the cur_elem counters and fill in the first sk_argument value for
> - * each array scankey. We can't do this until we know the scan direction.
> + * each array scankey.
> */
> void
> _bt_start_array_keys(IndexScanDesc scan, ScanDirection dir)
> @@ -519,159 +888,1163 @@ _bt_start_array_keys(IndexScanDesc scan, ScanDirection dir)
> BTScanOpaque so = (BTScanOpaque) scan->opaque;
> int i;
>
> + Assert(so->numArrayKeys);
> + Assert(so->qual_ok);
Has the requirement for a known scan direction been removed, or should
this still have an Assert(dir != NoMovementScanDirection)?
> +++ b/src/backend/access/nbtree/nbtsearch.c
> @@ -1713,17 +1756,18 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
> [...]
> - _bt_checkkeys(scan, itup, truncatt, dir, &continuescan, false, false);
> + pstate.prechecked = false; /* prechecked earlier tuple */
I'm not sure that comment is correct, at least it isn't as clear as
can be. Maybe something more in the line of the following?
+ pstate.prechecked = false; /* prechecked didn't cover HIKEY */
+++ b/src/backend/access/nbtree/nbtutils.c
> @@ -272,7 +319,32 @@ _bt_preprocess_array_keys(IndexScanDesc scan)
> + elemtype = cur->sk_subtype;
> + if (elemtype == InvalidOid)
> + elemtype = rel->rd_opcintype[cur->sk_attno - 1];
Should we Assert() that this elemtype matches the array element type
ARR_ELEMTYPE(arrayval) used to deconstruct the array?
> + /*
> + * If this scan key is semantically equivalent to a previous equality
> + * operator array scan key, merge the two arrays together to eliminate
> [...]
> + if (prevArrayAtt == cur->sk_attno && prevElemtype == elemtype)
This is not "a" previous equality key, but "the" previous equality
operator array scan key.
Do we want to expend some additional cycles for detecting duplicate
equality array key types in interleaved order like =int[] =bigint[]
=int[]? I don't think it would be very expensive considering the
limited number of cross-type equality operators registered in default
PostgreSQL, so a simple loop that checks matching element types
starting at the first array key scankey for this attribute should
suffice. We could even sort the keys by element type if we wanted to
fix any O(n) issues for this type of behaviour (though this is
_extremely_ unlikely to become a performance issue).
> + * qual is unsatisfiable
> + */
> + if (num_elems == 0)
> + {
> + so->qual_ok = false;
> + return NULL;
> + }
I think there's a memory context switch back to oldContext missing
here, as well as above at `if (num_nonnulls == 0)`. These were
probably introduced by changing from break to return; and the paths
aren't yet exercised in regression tests.
> +++ b/src/backend/utils/adt/selfuncs.c
has various changes in comments that result in spelling and style issues:
> + * primitive index scans that will be performed for caller
+ * primitive index scans that will be performed for the caller.
(missing "the", period)
> - * share for each outer scan. (Don't pro-rate for ScalarArrayOpExpr,
> - * since that's internal to the indexscan.)
> + * share for each outer scan
The trailing period was lost:
+ * share for each outer scan.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Matthias van de Meent | 2024-03-06 21:51:18 | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |
Previous Message | Tomas Vondra | 2024-03-06 21:34:54 | Re: Make attstattarget nullable |