Re: PoC/WIP: Extended statistics on expressions

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC/WIP: Extended statistics on expressions
Date: 2021-03-24 16:48:17
Message-ID: a7d17aec-234d-c7f2-bfdc-99c87767a507@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/24/21 5:28 PM, Dean Rasheed wrote:
> On Wed, 24 Mar 2021 at 14:48, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> AFAIK the primary issue here is that the two places disagree. While
>> estimate_num_groups does this
>>
>> varnos = pull_varnos(root, (Node *) varshere);
>> if (bms_membership(varnos) == BMS_SINGLETON)
>> { ... }
>>
>> the add_unique_group_expr does this
>>
>> varnos = pull_varnos(root, (Node *) groupexpr);
>>
>> That is, one looks at the group expression, while the other look at vars
>> extracted from it by pull_var_clause(). Apparently for PlaceHolderVar
>> this can differ, causing the crash.
>>
>> So we need to change one of those places - my fix tweaked the second
>> place to also look at the vars, but maybe we should change the other
>> place? Or maybe it's not the right fix for PlaceHolderVars ...
>>
>
> I think that it doesn't make any difference which place is changed.
>
> This is a case of an expression with no stats. With your change,
> you'll get a single GroupExprInfo containing a list of
> VariableStatData's for each of it's Var's, whereas if you changed it
> the other way, you'd get a separate GroupExprInfo for each Var. But I
> think they'd both end up being treated the same by
> estimate_multivariate_ndistinct(), since there wouldn't be any stats
> matching the expression, only the individual Var's. Maybe changing the
> first place would be the more bulletproof fix though.
>

Yeah, I think that's true. I'll do a bit more research / experiments.

As for the changes proposed in the create_statistics, do we really want
to use univariate / multivariate there? Yes, the terms are correct, but
I'm not sure how many people looking at CREATE STATISTICS will
understand them.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2021-03-24 17:00:36 Re: Support for NSS as a libpq TLS backend
Previous Message Jacob Champion 2021-03-24 16:45:35 Re: Proposal: Save user's original authenticated identity for logging