Re: POC, WIP: OR-clause support for indexes

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-04-07 22:34:37
Message-ID: CAPpHfdtGmUHv=E4QWGWHk2QUuJ-x0mTNBssW_Ehv2dcOQ2L66g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Mon, Apr 1, 2024 at 9:38 AM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 28/3/2024 16:54, Alexander Korotkov wrote:
> > The current patch has a boolean guc enable_or_transformation.
> > However, when we have just a few ORs to be transformated, then we
> > should get less performance gain from the transformation and higher
> > chances to lose a good bitmap scan plan from that. When there is a
> > huge list of ORs to be transformed, then the performance gain is
> > greater and it is less likely we could lose a good bitmap scan plan.
> >
> > What do you think about introducing a GUC threshold value: the minimum
> > size of list to do OR-to-ANY transformation?
> > min_list_or_transformation or something.
> I labelled it or_transformation_limit (see in attachment). Feel free to
> rename it.
> It's important to note that the limiting GUC doesn't operate
> symmetrically for forward, OR -> SAOP, and backward SAOP -> OR
> operations. In the forward case, it functions as you've proposed.
> However, in the backward case, we only check whether the feature is
> enabled or not. This is due to our existing limitation,
> MAX_SAOP_ARRAY_SIZE, and the fact that we can't match the length of the
> original OR list with the sizes of the resulting SAOPs. For instance, a
> lengthy OR list with 100 elements can be transformed into 3 SAOPs, each
> with a size of around 30 elements.
> One aspect that requires attention is the potential inefficiency of our
> OR -> ANY transformation when we have a number of elements less than
> MAX_SAOP_ARRAY_SIZE. This is because we perform a reverse transformation
> ANY -> OR at the stage of generating bitmap scans. If the BitmapScan
> path dominates, we may have done unnecessary work. Is this an occurrence
> that we should address?
> But the concern above may just be a point of improvement later: We can
> add one more strategy to the optimizer: testing each array element as an
> OR clause; we can also provide a BitmapOr path, where SAOP is covered
> with a minimal number of partial indexes (likewise, previous version).

I've revised the patch. Did some beautification, improvements for
documentation, commit messages etc.

I've pushed the 0001 patch without 0002. I think 0001 is good by
itself given that there is the or_to_any_transform_limit GUC option.
The more similar OR clauses are here the more likely grouping them
into SOAP will be a win. But I've changed the default value to 5.
This will make it less invasive and affect only queries with obvious
repeating patterns. That also reduced the changes in the regression
tests expected outputs.

Regarding 0002, it seems questionable since it could cause a planning
slowdown for SAOP's with large arrays. Also, it might reduce the win
of transformation made by 0001. So, I think we should skip it for
now.

------
Regards,
Alexander Korotkov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-04-07 22:41:16 Re: Add bump memory context type and use it for tuplesorts
Previous Message Daniel Gustafsson 2024-04-07 22:28:42 Re: Cluster::restart dumping logs when stop fails