Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, benoit <benoit(at)hopsandfork(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Date: 2024-03-04 20:51:37
Message-ID: CAEze2Whe5L-z65DgvUgPkkLt6d2PxG1gV8Z_fNWjS9+-1jkDWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2 Mar 2024 at 02:30, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Thu, Feb 15, 2024 at 6:36 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > Attached is v11, which now says something like that in the commit
> > message.
>
> Attached is v12.

Some initial comments on the documentation:

> + that searches for multiple values together. Queries that use certain
> + <acronym>SQL</acronym> constructs to search for rows matching any value
> + out of a list (or an array) of multiple scalar values might perform
> + multiple <quote>primitive</quote> index scans (up to one primitive scan
> + per scalar value) at runtime. See <xref linkend="functions-comparisons"/>
> + for details.

I don't think the "see <functions-comparisons> for details" is
correctly worded: The surrounding text implies that it would contain
details about in which precise situations multiple primitive index
scans would be consumed, while it only contains documentation about
IN/NOT IN/ANY/ALL/SOME.

Something like the following would fit better IMO:

+ that searches for multiple values together. Queries that use certain
+ <acronym>SQL</acronym> constructs to search for rows matching any value
+ out of a list or array of multiple scalar values (such as those
described in
+ <functions-comparisons> might perform multiple <quote>primitive</quote>
+ index scans (up to one primitive scan per scalar value) at runtime.

Then there is a second issue in the paragraph: Inverted indexes such
as GIN might well decide to start counting more than one "primitive
scan" per scalar value, because they may need to go through their
internal structure more than once to produce results for a single
scalar value; e.g. with queries WHERE textfield LIKE '%some%word%', a
trigram index would likely use at least 4 descents here: one for each
of "som", "ome", "wor", "ord".

> > All that really remains now is to research how we might integrate this
> > work with the recently added continuescanPrechecked/haveFirstMatch
> > stuff from Alexander Korotkov, if at all.
>
> The main change in v12 is that I've integrated both the
> continuescanPrechecked and the haveFirstMatch optimizations. Both of
> these fields are now page-level state, shared between the _bt_readpage
> caller and the _bt_checkkeys/_bt_advance_array_keys callees (so they
> appear next to the new home for _bt_checkkeys' continuescan field, in
> the new page state struct).

Cool. I'm planning to review the rest of this patch this
week/tomorrow, could you take some time to review some of my btree
patches, too?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-03-04 21:03:13 Re: Adding deprecation notices to pgcrypto documentation
Previous Message Robert Haas 2024-03-04 20:50:07 Re: [PATCH] Exponential backoff for auth_delay