Re: using extended statistics to improve join estimates

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: using extended statistics to improve join estimates
Date: 2024-04-03 01:33:28
Message-ID: 87frw3cixj.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:

> On 4/2/24 10:23, Andy Fan wrote:
>>
>>> On Wed, Mar 02, 2022 at 11:38:21AM -0600, Justin Pryzby wrote:
>>>> Rebased over 269b532ae and muted compiler warnings.
>>
>> Thank you Justin for the rebase!
>>
>> Hello Tomas,
>>
>> Thanks for the patch! Before I review the path at the code level, I want
>> to explain my understanding about this patch first.
>>
>
> If you want to work on this patch, that'd be cool. A review would be
> great, but if you want to maybe take over and try moving it forward,
> that'd be even better. I don't know when I'll have time to work on it
> again, but I'd promise to help you with working on it.

OK, I'd try to moving it forward.

>
>> Before this patch, we already use MCV information for the eqjoinsel, it
>> works as combine the MCV on the both sides to figure out the mcv_freq
>> and then treat the rest equally, but this doesn't work for MCV in
>> extended statistics, this patch fill this gap. Besides that, since
>> extended statistics means more than 1 columns are involved, if 1+
>> columns are Const based on RestrictInfo, we can use such information to
>> filter the MCVs we are interesting, that's really cool.
>>
>
> Yes, I think that's an accurate description of what the patch does.

Great to know that:)

>
>> I did some more testing, all of them are inner join so far, all of them
>> works amazing and I am suprised this patch didn't draw enough
>> attention.

> I think it didn't go forward for a bunch of reasons:
>
..
>
> 3) Uncertainty about how applicable the patch is in practice.
>
> I suppose it was some combination of these reasons, not sure.
>
> As for the "practicality" mentioned in (3), it's been a while since I
> worked on the patch so I don't recall the details, but I think I've been
> thinking mostly about "start join" queries, where a big "fact" table
> joins to small dimensions. And in that case the fact table may have a
> MCV, but the dimensions certainly don't have any (because the join
> happens on a PK).
>
> But maybe that's a wrong way to think about it - it was clearly useful
> to consider the case with (per-attribute) MCVs on both sides as worth
> special handling. So why not to do that for multi-column MCVs, right?

Yes, that's what my current understanding is.

There are some cases where there are 2+ clauses between two tables AND
the rows estimiation is bad AND the plan is not the best one. In such
sisuations, I'd think this patch probably be helpful. The current case
in hand is PG11, there is no MCV information for extended statistics, so
I even can't verify the patch here is useful or not manually. When I see
them next time in a newer version of PG, I can verity it manually to see
if the rows estimation can be better.

>> At for the code level, I reviewed them in the top-down manner and almost
>> 40% completed. Here are some findings just FYI. For efficiency purpose,
>> I provide each feedback with a individual commit, after all I want to
>> make sure my comment is practical and coding and testing is a good way
>> to archive that. I tried to make each of them as small as possible so
>> that you can reject or accept them convinently.
>>
>> 0001 is your patch, I just rebase them against the current master. 0006
>> is not much relevant with current patch, and I think it can be committed
>> individually if you are OK with that.
>>
>> Hope this kind of review is helpful.
>>
>
> Cool! There's obviously no chance to get this into v18, and I have stuff
> to do in this CF. But I'll take a look after that.

Good to know that. I will continue my work before that.

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sutou Kouhei 2024-04-03 01:36:12 Re: Commitfest Manager for March
Previous Message Thomas Munro 2024-04-03 00:31:11 Re: Streaming I/O, vectored I/O (WIP)