From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Andy Fan <zhihuifan1213(at)163(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Julien Rouhaud <rjuju123(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: using extended statistics to improve join estimates |
Date: | 2024-04-02 18:22:55 |
Message-ID: | 2a9604a5-520d-48ee-b3c8-5f3342b607d3@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 4/2/24 10:23, Andy Fan wrote:
>
>> On Wed, Mar 02, 2022 at 11:38:21AM -0600, Justin Pryzby wrote:
>>> Rebased over 269b532ae and muted compiler warnings.
>
> Thank you Justin for the rebase!
>
> Hello Tomas,
>
> Thanks for the patch! Before I review the path at the code level, I want
> to explain my understanding about this patch first.
>
If you want to work on this patch, that'd be cool. A review would be
great, but if you want to maybe take over and try moving it forward,
that'd be even better. I don't know when I'll have time to work on it
again, but I'd promise to help you with working on it.
> Before this patch, we already use MCV information for the eqjoinsel, it
> works as combine the MCV on the both sides to figure out the mcv_freq
> and then treat the rest equally, but this doesn't work for MCV in
> extended statistics, this patch fill this gap. Besides that, since
> extended statistics means more than 1 columns are involved, if 1+
> columns are Const based on RestrictInfo, we can use such information to
> filter the MCVs we are interesting, that's really cool.
>
Yes, I think that's an accurate description of what the patch does.
> I did some more testing, all of them are inner join so far, all of them
> works amazing and I am suprised this patch didn't draw enough
> attention. I will test more after I go though the code.
>
I think it didn't go forward for a bunch of reasons:
1) I got distracted by something else requiring immediate attention, and
forgot about this patch.
2) I got stuck on some detail of the patch, unsure which of the possible
solutions to try first.
3) Uncertainty about how applicable the patch is in practice.
I suppose it was some combination of these reasons, not sure.
As for the "practicality" mentioned in (3), it's been a while since I
worked on the patch so I don't recall the details, but I think I've been
thinking mostly about "start join" queries, where a big "fact" table
joins to small dimensions. And in that case the fact table may have a
MCV, but the dimensions certainly don't have any (because the join
happens on a PK).
But maybe that's a wrong way to think about it - it was clearly useful
to consider the case with (per-attribute) MCVs on both sides as worth
special handling. So why not to do that for multi-column MCVs, right?
> At for the code level, I reviewed them in the top-down manner and almost
> 40% completed. Here are some findings just FYI. For efficiency purpose,
> I provide each feedback with a individual commit, after all I want to
> make sure my comment is practical and coding and testing is a good way
> to archive that. I tried to make each of them as small as possible so
> that you can reject or accept them convinently.
>
> 0001 is your patch, I just rebase them against the current master. 0006
> is not much relevant with current patch, and I think it can be committed
> individually if you are OK with that.
>
> Hope this kind of review is helpful.
>
Cool! There's obviously no chance to get this into v18, and I have stuff
to do in this CF. But I'll take a look after that.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2024-04-02 18:31:48 | Re: Fix resource leak (src/backend/libpq/be-secure-common.c) |
Previous Message | Ranier Vilela | 2024-04-02 18:13:19 | Fix resource leak (src/backend/libpq/be-secure-common.c) |