Quick Links

Re: using extended statistics to improve join estimates

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Andy Fan <zhihuifan1213(at)163(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	Julien Rouhaud <rjuju123(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: using extended statistics to improve join estimates
Date:	2024-04-02 18:22:55
Message-ID:	2a9604a5-520d-48ee-b3c8-5f3342b607d3@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 4/2/24 10:23, Andy Fan wrote:
>
>> On Wed, Mar 02, 2022 at 11:38:21AM -0600, Justin Pryzby wrote:
>>> Rebased over 269b532ae and muted compiler warnings.
>
> Thank you Justin for the rebase!
>
> Hello Tomas,
>
> Thanks for the patch! Before I review the path at the code level, I want
> to explain my understanding about this patch first.
>

If you want to work on this patch, that'd be cool. A review would be
great, but if you want to maybe take over and try moving it forward,
that'd be even better. I don't know when I'll have time to work on it
again, but I'd promise to help you with working on it.

> Before this patch, we already use MCV information for the eqjoinsel, it
> works as combine the MCV on the both sides to figure out the mcv_freq
> and then treat the rest equally, but this doesn't work for MCV in
> extended statistics, this patch fill this gap. Besides that, since
> extended statistics means more than 1 columns are involved, if 1+
> columns are Const based on RestrictInfo, we can use such information to
> filter the MCVs we are interesting, that's really cool.
>

Yes, I think that's an accurate description of what the patch does.

> I did some more testing, all of them are inner join so far, all of them
> works amazing and I am suprised this patch didn't draw enough
> attention. I will test more after I go though the code.
>

I think it didn't go forward for a bunch of reasons:

1) I got distracted by something else requiring immediate attention, and
forgot about this patch.

2) I got stuck on some detail of the patch, unsure which of the possible
solutions to try first.

3) Uncertainty about how applicable the patch is in practice.

I suppose it was some combination of these reasons, not sure.

As for the "practicality" mentioned in (3), it's been a while since I
worked on the patch so I don't recall the details, but I think I've been
thinking mostly about "start join" queries, where a big "fact" table
joins to small dimensions. And in that case the fact table may have a
MCV, but the dimensions certainly don't have any (because the join
happens on a PK).

But maybe that's a wrong way to think about it - it was clearly useful
to consider the case with (per-attribute) MCVs on both sides as worth
special handling. So why not to do that for multi-column MCVs, right?

> At for the code level, I reviewed them in the top-down manner and almost
> 40% completed. Here are some findings just FYI. For efficiency purpose,
> I provide each feedback with a individual commit, after all I want to
> make sure my comment is practical and coding and testing is a good way
> to archive that. I tried to make each of them as small as possible so
> that you can reject or accept them convinently.
>
> 0001 is your patch, I just rebase them against the current master. 0006
> is not much relevant with current patch, and I think it can be committed
> individually if you are OK with that.
>
> Hope this kind of review is helpful.
>

Cool! There's obviously no chance to get this into v18, and I have stuff
to do in this CF. But I'll take a look after that.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: using extended statistics to improve join estimates at 2024-04-02 08:23:45 from Andy Fan

Responses

Re: using extended statistics to improve join estimates at 2024-04-03 01:33:28 from Andy Fan
Re: using extended statistics to improve join estimates at 2024-05-20 08:31:38 from Andrei Lepikhov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Daniel Gustafsson	2024-04-02 18:31:48	Re: Fix resource leak (src/backend/libpq/be-secure-common.c)
Previous Message	Ranier Vilela	2024-04-02 18:13:19	Fix resource leak (src/backend/libpq/be-secure-common.c)