Re: allowing extensions to control planner behavior

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: allowing extensions to control planner behavior
Date: 2024-10-23 08:51:02
Message-ID: deb87eba-d4e2-40ad-84e9-219a25516b2d@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/23/24 15:05, Robert Haas wrote:
> On Sat, Oct 19, 2024 at 6:00 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>> Generally, a hash value doesn't 100% guarantee the uniqueness of a node
>> identification. Also, RelOptInfo corresponds to a subtree in the final
>> plan, and sometimes, it takes work to find which node in the partially
>> executed plan corresponds to this specific estimation on row number
>> during selectivity estimation. Remember parameterised paths - you should
>> attach some signature for each path. So, it is not fully strict method.
>> If you are interested, I can perhaps explain the method a little bit
>> more at some meetup.
>
> Yeah, I agree that this is not the best method. While it's true that
> you could get a false match in case of a hash value collision, IMHO
> the bigger problem is that it seems like an expensive way of
> determining something that we really should know already. If the user
> types the same query, mentioning the same relations, in the same
> order, with the same constructs around them, it's hard to believe that
> hashing is the cheapest way of matching up the old and new ones. I'm
> not sure exactly what we should do instead, but it feels like we more
> or less have this information during parsing and then we lose track of
> it as the query goes through the rewrite and planning phases.
Parse tree may be implemented with multiple execution plans. Even
clauses can be transformed during optimisation (Remember OR -> ANY).
Also, the cardinality of a middle-tree join depends on the inner and
outer subtrees. Because of that, having a hash on RelOptInfo's relids
and restrictions + hashes of child RelOptInfos and carrying it through
all other stages up to the end of execution is the most stable approach
I know.

--
regards, Andrei Lepikhov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2024-10-23 09:07:36 Re: Remove unnecessary word in a comment
Previous Message Amit Kapila 2024-10-23 08:37:53 Re: Pgoutput not capturing the generated columns