Re: BUG #18643: EXPLAIN estimated rows mismatch

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, ming wei tan <mingwei(dot)tc(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18643: EXPLAIN estimated rows mismatch
Date: 2024-10-03 06:15:10
Message-ID: 5abff03e-5eb5-482e-bbd5-809451ff24cb@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 10/3/24 11:40, Tom Lane wrote:
> Andrei Lepikhov <lepihov(at)gmail(dot)com> writes:
>> So, we already pass through the OR clauses. Why not to check semi-equal
>> clauses and remove duplicates even if not all clauses are such
>> duplicates? At least, it continually raises users' questions.
>
> It's difficult to justify spending extra planner cycles to optimize
> what are fundamentally stupidly-written queries. Who writes "X=X"
> in the first place? (Other than ORM authors who need to spend some
> time in a re-education camp.)
I don't oppose your arguments at first. I want to say that ORMs and any
APIs (I sometimes see data analysts who use an AI to request a database)
are already essential to development (at least in startups). It may not
be suitable to deal with such cases in the core, but it is too costly to
do it in an extension right now. In this sense 'silicon' users aren't
equal to more smart carbon-made ones. And here I see two options:
1. canonicalize_qual_hook - the most blunt approach. Of course, we will
need more hooks in the near future with this approach.
2. An 'ORM' GUC to enable multiple optimisations in the core,
'smoothing' users' mistakes.

It would be great to discuss other options.

> And it would not be a trivial number
> of extra cycles, either. As pointed out in the commit message
> David mentioned, it's basically free to make this improvement
> when we're looking at a potential EquivalenceClass clause.
> We've already paid the cost of checking that the operator is a btree
> equality operator, and we know that the clause is at top level of
> WHERE (else we couldn't fuzz over the difference between false and
> null results), and besides we have to check whether it's "X=X" because
> not doing so causes some semantic problems for the EquivalenceClass
> machinery. In the case of a random clause-underneath-OR, we would
> have to make a brand new check whether it's btree equality, and we
> would have to somehow track whether we had descended to an expression
> level where "false and null are known equivalent" is no longer true.
> So I really doubt that a case can be made that that is worth doing.
Thanks, this explanation is quite valuable for me.

--
regards, Andrei Lepikhov

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Abhishek Mittal 2024-10-03 09:49:46 Re: BUG #18624: Memory Leak Issue with PostgreSQL Connection During COPY Command Execution.
Previous Message Tom Lane 2024-10-03 04:40:02 Re: BUG #18643: EXPLAIN estimated rows mismatch