Re: POC, WIP: OR-clause support for indexes

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
Cc: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-08-06 02:01:14
Message-ID: CAH2-WzkmLg0JPXkZ75b+ha_wJGoJk3QyZ6aUjA0Ba11rdPwdgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 3, 2023 at 12:47 PM Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> wrote:
> It's all right. I understand your position)

Okay, good to know. :-)

> I also agree to try to find other optimization cases and generalize them.

Good idea. Since the real goal is to "get a working flow of
information", the practical value of trying to get it working with
other clauses seems to be of secondary importance.

> To be honest, I tried to fix it many times by calling the function to
> calculate selectivity, and each time the result of the estimate did not
> change. I didn't have any problems in this part after moving the
> transformation to the parsing stage. I even tried to perform this
> transformation at the planning stage (to the preprocess_qual_conditions
> function), but I ran into the same problem there as well.
>
> To tell the truth, I think I'm ready to investigate this problem again
> (maybe I'll be able to see it differently or really find that I missed
> something in previous times).

The optimizer will itself do a limited form of "normalizing to CNF".
Are you familiar with extract_restriction_or_clauses(), from
orclauses.c? Comments above the function have an example of how this
can work:

* Although a join clause must reference multiple relations overall,
* an OR of ANDs clause might contain sub-clauses that reference just one
* relation and can be used to build a restriction clause for that rel.
* For example consider
* WHERE ((a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45));
* We can transform this into
* WHERE ((a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45))
* AND (a.x = 42 OR a.x = 44)
* AND (b.y = 43 OR b.z = 45);
* which allows the latter clauses to be applied during the scans of a and b,
* perhaps as index qualifications, and in any case reducing the number of
* rows arriving at the join. In essence this is a partial transformation to
* CNF (AND of ORs format). It is not complete, however, because we do not
* unravel the original OR --- doing so would usually bloat the qualification
* expression to little gain.

Of course this immediately makes me wonder: shouldn't your patch be
able to perform an additional transformation here? You know, by
transforming "a.x = 42 OR a.x = 44" into "a IN (42, 44)"? Although I
haven't checked for myself, I assume that this doesn't happen right
now, since your patch currently performs all of its transformations
during parsing.

I also noticed that the same comment block goes on to say something
about "clauselist_selectivity's inability to recognize redundant
conditions". Perhaps that is relevant to the problems you were having
with selectivity estimation, back when the code was in
preprocess_qual_conditions() instead? I have no reason to believe that
there should be any redundancy left behind by your transformation, so
this is just one possibility to consider.

Separately, the commit message of commit 25a9e54d2d says something
about how the planner builds RestrictInfos, which seems
possibly-relevant. That commit enhanced extended statistics for OR
clauses, so the relevant paragraph describes a limitation of extended
statistics with OR clauses specifically. I'm just guessing, but it
still seems like it might be relevant to the problem you ran into with
selectivity estimation. Another possibility to consider.

BTW, I sometimes use RR to help improve my understanding of the planner:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Recording_Postgres_using_rr_Record_and_Replay_Framework

The planner has particularly complicated control flow, which has
unique challenges -- just knowing where to begin can be difficult
(unlike most other areas). I find that setting watchpoints to see when
and where the planner modifies state using RR is far more useful than
it would be with regular GDB. Once I record a query, I find that I can
"map out" what happens in the planner relatively easily.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jacktby jacktby 2023-08-06 04:34:18 How to add a new operator for parser?
Previous Message Noah Misch 2023-08-05 23:08:47 Re: PG 16 draft release notes ready