Re: POC, WIP: OR-clause support for indexes

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-06-25 23:13:53
Message-ID: e58a1ad1-1da1-4228-ab86-aa588c03756d@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24.06.2024 18:28, Robert Haas wrote:
> On Fri, Jun 21, 2024 at 6:52 PM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>> It's hard to tell, but I think it might be one of the good places to apply transformation. Let me describe a brief conclusion on the two approaches.
> This explanation is somewhat difficult for me to follow. For example:
>
>> In the first approach, we definitely did not process the extra "OR" expressions in the first approach, since they were packaged as an Array. It could lead to the fact that less planning time would be spent on the optimizer.
> I don't know what the "first approach" refers to, or what processing
> the extra "OR" expressions means, or what it would mean to package OR
> expressions as an array. If you made them into an SAOP then you'd have
> an array*instead of* OR expressions, not OR expressions "packaged as
> an array" but even then, they'd still be processed somewhere, unless
> the patch was just wrong.
>
> I think you should try writing this summary again and see if you can
> make it a lot clearer and more precise.
>
> I'm suspicious based that we should actually be postponing the
> transformation even further. If, for example, the transformation is
> advantageous for index scans and disadvantageous for bitmap scans, or
> the other way around, then this approach can't help much: it either
> does the transformation and all scan types are affected, or it doesn't
> do it and no scan types are affected. But if you decided for each scan
> whether to transform the quals, then you could handle that. Against
> that, there might be increased planning cost. But, perhaps that could
> be avoided somehow.

Sorry, you are right and I'll try to explain more precisely. The
firstapproach isthefirstpartof the patch,wherewemade "Or" expressions
into an SAOPatan earlystageof plangeneration[0],the secondonewasthe one
proposedby A.Korotkov[1].

So, when we made "OR" expressions into an SAOPat the post-parsing stage
of the plan generation [0], we definitely did not process the
redundantexpressions"OR" expressions there (for example,duplicates),
since they were transformed to SAOP expression. Furthermore, the list of
OR expressions can be significantly reduced, since constants belonging
to the same predicate will already be converted into an SAOP expression.
I assume this may reduce planning time, as I know several places in the
optimizer where these lists of "OR" expressions are scanned several times.
Also the selectivity for SAOP expressions is estimated better, which
could lead to the generation of a more optimal plan, but, to be honest,
this is just an observation from changes in regression tests and, in
general, how the process of calculating the selectivity of a complex
expression works. And I think it needs further consideration. SELECT *
FROM check_estimated_rows('SELECT * FROM functional_dependencies WHERE
(a = 1 OR a = 51) AND b = ''1'''); estimated | actual
-----------+-------- - 99 | 100 + 100 | 100 (1 row) SELECT * FROM
check_estimated_rows('SELECT * FROM functional_dependencies WHERE (a = 1
OR a = 51) AND (b = ''1'' OR b = ''2'')'); estimated | actual
-----------+-------- - 99 | 100 + 100 | 100 (1 row) SELECT * FROM
check_estimated_rows('SELECT * FROM functional_dependencies WHERE (a = 1
OR a = 2 OR a = 51 OR a = 52) AND (b = ''1'' OR b = ''2'')'); estimated
| actual -----------+-------- - 197 | 200 + 200 | 200 Speaking of the
main disadvantages, we do not give the optimizer the opportunity to
generate a plan using BitmapScan, which can lead to the generation of a
suboptimal plan, but in the current approach the same thing happens [2].
And you mentioned about it before:

On 24.06.2024 18:28, Robert Haas wrote:
> I'm suspicious based that we should actually be postponing the
> transformation even further. If, for example, the transformation is
> advantageous for index scans and disadvantageous for bitmap scans, or
> the other way around, then this approach can't help much: it either
> does the transformation and all scan types are affected, or it doesn't
> do it and no scan types are affected. But if you decided for each scan
> whether to transform the quals, then you could handle that. Against
> that, there might be increased planning cost. But, perhaps that could
> be avoided somehow.

Andrei mentioned the problem, which might be caused by the
transformation on the later stage of optimization is updating references
to expressions in RestrictInfo [3] lists, because they can be used in
different parts during the formation of the query plan. As the practice
of Self-join removal [4] has shown, this can be expensive, but feasible.
By applying the transformation at the analysis stage [0], because no
links were created, so we did not encounter such problems, so this
approach was more suitable than the others.

If some things were not clear enough, let me know.

[0]
https://www.postgresql.org/message-id/attachment/156971/v21-0001-Transform-OR-clauses-to-ANY-expression.patch
[1]
https://www.postgresql.org/message-id/CAPpHfduah1PLzajBJFDmp7%2BMZuaWYpie2p%2BGsV0r03fcGghQ-g%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/7d5aed92-d4cc-4b76-8ae0-051d182c9eec%40postgrespro.ru
[3]
https://www.postgresql.org/message-id/6850c306-4e9d-40b7-8096-1f3c7d29cd9e%40postgrespro.ru
[4] https://commitfest.postgresql.org/48/5043/

On 24.06.2024 18:28, Robert Haas wrote:
> The alternative that should be considered is not combining things if
> the types don't match. If we're going to combine such things, we need
> to be absolutely certain that type conversion cannot fail.

Peter,Robert,thanksforthe detaileddiscussion,I realizedthathereyou
needto lookcarefullyatthe patch. In general, it comes out, I need to pay
attention and highlight the cases where POLA violation occurs

On 24.06.2024 18:28, Robert Haas wrote:
> One reason is that it is extra work to convert things to a name and
> then back to an OID. It's got to be slower than using the OID you
> already have.
>
> The other reason is that it's error-prone. If somehow the second
> lookup doesn't produce the same OID as the first lookup, bad things
> will happen, possibly including security vulnerabilities. I see you've
> taken steps to avoid that, like nailing down the schema, and that's
> good, but it's not a good enough reason to do it like this. If we
> don't have a function that can construct the node we need with the OID
> rather than the name as an argument, we should invent one, not do this
> sort of thing.

I understood. I'll try to fix it.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-06-25 23:33:35 Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Previous Message Tom Lane 2024-06-25 23:11:24 Re: Should we document how column DEFAULT expressions work?