From: | Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2023-08-03 19:47:44 |
Message-ID: | bf504287-c3ac-e492-c6ea-31be60b2c92f@yandex.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 02.08.2023 21:58, Peter Geoghegan wrote:
> On Wed, Aug 2, 2023 at 8:58 AM Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> wrote:
>> No, I haven't thought about it yet. I studied the example and it would
>> really be nice to add optimization here. I didn't notice any problems
>> with its implementation. I also have an obvious example with the "or"
>> operator, for example
>> , select * from multi_test, where (a, b ) = ( 1, 1 ) or (a, b ) = ( 2, 1
>> ) ...;
>>
>> Although I think such a case will be used less often.
> Right. As I said, I don't particularly care about the row constructor
> syntax -- it's not essential.
>
> In my experience patches like this one that ultimately don't succeed
> usually *don't* have specific problems that cannot be fixed. The real
> problem tends to be ambiguity about the general high level design. So
> more than anything else, ambiguity is the thing that you need to
> minimize to be successful here. This is the #1 practical problem, by
> far. This may be the only thing about your patch that I feel 100% sure
> of.
>
> In my experience it can actually be easier to expand the scope of a
> project, and to come up with a more general solution:
>
> https://en.wikipedia.org/wiki/Inventor%27s_paradox
>
> I'm not trying to make your work more difficult by expanding its
> scope. I'm actually trying to make your work *easier* by expanding its
> scope. I don't claim to know what the specific scope of your patch
> should be at all. Just that it might be possible to get a much clearer
> picture of what the ideal scope really is by *trying* to generalize it
> further -- that understanding is what we lack right now. Even if this
> exercise fails in some way, it won't really have been a failure. The
> reasons why it fails will still be interesting and practically
> relevant.
>
> As I explained to Jim, I am trying to put things in this area on a
> more rigorous footing. For example, I have said that the way that the
> nbtree code executes SAOP quals is equivalent to DNF. That is
> basically true, but it's also my own slightly optimistic
> interpretation of history and of the design. That's a good start, but
> it's not enough on its own.
>
> My interpretation might still be wrong in some subtle way, that I have
> yet to discover. That's really what I'm concerned about with your
> patch, too. I'm currently trying to solve a problem that I don't yet
> fully understand, so for me "getting a properly working flow of
> information" seems like a good practical exercise. I'm trying to
> generalize the design of my own patch as far as I can, to see what
> breaks, and why it breaks. My intuition is that this will help me with
> my own patch by forcing me to gain a truly rigorous understanding of
> the problem.
>
> My suggestion about generalizing your approach to cover RowCompareExpr
> cases is what I would do, if I were you, and this was my patch. That's
> almost exactly what I'm doing with my own patch already, in fact.
It's all right. I understand your position)
I also agree to try to find other optimization cases and generalize them.
I read the wiki article, and as I understand it, in such a situation we
see a difficult problem with finding expressions that need to be
converted into a logically correct expression and simplify execution for
the executor. For example, this is a ROW type. It can have a simpler
expression with AND and OR operations, besides we can exclude
duplicates. But some of these transformations may be incorrect or they
will have a more complex representation. We can try to find the
problematic expressions and try to combine them into groups and finally
find a solutions for each groups or, conversely, discover that the
existing transformation is uncorrected. If I understand correctly, we
should first start searching for "ROW" expressions (define a group for
them) and think about a solution for the group.
>> It seems to me the most difficult thing is to notice problematic cases
>> where the transformations are incorrect, but I think it can be implemented.
> Right. To be clear, I am sure that it won't be practical to come up
> with a 100% theoretically pure approach. If for no other reason than
> this: normalizing to CNF in all cases will run into problems with very
> complex predicates. It might even be computationally intractable
> (could just be very slow). So there is a clear need to keep
> theoretical purity in balance with practical considerations. There is
> a need for some kind of negotiation between those two things. Probably
> some set of heuristics will ultimately be required to keep costs and
> benefits in balance.
>
> I don't expect you to determine what set of heuristics will ultimately
> be required to determine when and how to perform CNF conversions, in
> the general case. But having at least some vague idea seems like it
> might increase confidence in your design.
I agree, but I think this will be the second step after solutions are
found.
> Do you think that this problem is just an accidental side-effect? It
> isn't necessarily the responsibility of your patch to fix such things.
> If it's even possible for selectivity estimates to change, then it's
> already certain that sometimes they'll be worse than before -- if only
> because of chance interactions. The optimizer is often right for the
> wrong reasons, and wrong for the right reasons -- we cannot really
> expect any patch to completely avoid problems like that.
To be honest, I tried to fix it many times by calling the function to
calculate selectivity, and each time the result of the estimate did not
change. I didn't have any problems in this part after moving the
transformation to the parsing stage. I even tried to perform this
transformation at the planning stage (to the preprocess_qual_conditions
function), but I ran into the same problem there as well.
To tell the truth, I think I'm ready to investigate this problem again
(maybe I'll be able to see it differently or really find that I missed
something in previous times).
--
Regards,
Alena Rybakina
Postgres Professional
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2023-08-03 20:35:11 | Re: [PoC] Implementation of distinct in Window Aggregates |
Previous Message | Tatsuo Ishii | 2023-08-03 19:25:52 | Re: Using defines for protocol characters |