Re: using extended statistics to improve join estimates

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: using extended statistics to improve join estimates
Date: 2024-09-03 12:58:05
Message-ID: a99e8522-4dd3-43e6-a1b7-144169bb315f@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17/6/2024 18:10, Tomas Vondra wrote:
> Let me quickly go through the original parts - most of this is already
> in the "review" patches, but it's better to quote the main points here
> to start a discussion. I'll omit some of the smaller suggestions, so
> please look at the 'review' patches.
>
>
> v20240617-0001-Estimate-joins-using-extended-statistics.patch
>
> - rewords a couple comments, particularly for statext_find_matching_mcv
>
> - a couple XXX comments about possibly stale/inaccurate comments
> v20240617-0054-clauselist_selectivity_hook.patch
>
> - I believe this does not work with the earlier patch that removed
> estimatedclaused bitmap from the "try" function.
This patch set is too big to eat at once - it's just challenging to
invent examples and counterexamples. Can we see these two patches in the
master and analyse further improvements based on that?

Some thoughts:
You remove verRelid. I have thought about replacing this value with
RelOptInfo, which would allow extensions (remember selectivity hook) to
know about the underlying path tree.

The first patch is generally ok, and I vote for having it in the master.
However, the most harmful case I see most reports about is parameterised
JOIN on multiple anded clauses. In that case, we have a scan filter on
something like the below:
x = $1 AND y = $2 AND ...
As I see, current patch doesn't resolve this issue currently.

--
regards, Andrei Lepikhov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hunaid Sohail 2024-09-03 13:28:59 Re: [PATCH] Add roman support for to_number function
Previous Message Benoit Lobréau 2024-09-03 12:34:06 Re: Parallel workers stats in pg_stat_database