From: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me> |
Subject: | Re: ERROR: corrupt MVNDistinct entry |
Date: | 2024-12-25 16:14:56 |
Message-ID: | 79660add-4238-4616-903e-ba1607f79f6e@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 25/12/2024 16:36, Richard Guo wrote:
> On Wed, Dec 25, 2024 at 5:14 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>> On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>>> 2. It is ok for Vars. But what about expressions? We use equal() in
>>> distinct, MCV and dependencies modules. Do we need to remove nulls
>>> before using extended statistics as a general rule?
>>
>> AFAIU, the expressions in extended statistics are not decorated with
>> any nullingrels bits, are they?
>
> I've just realized that there may be other places with similar issues,
> not just in estimate_num_groups. For instance,
I'm pleased to see that you've grasped my initially unclear idea. Yeah,
it seems that all types of statistics may be lost because of varnullingrels.
> In v16 and later, the nullingrels within the expression "t2.a + t2.b"
> prevent it from being matched to the corresponding expression in
> extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).
>
> It seems that we need to strip out the nullingrels bits from
> expressions before matching them to extended statistics or
> expressional index columns in more places.
I think Tomas Vondra may have a decisive opinion in this place: we have
already discussed some approaches to calculate NULLs generated by RHS of
Left Join shortly.
Maybe we can commit a quick cure like the one provided in your patch,
but we should remember this example - it is not apparent to me how to
estimate a group of clauses in the case when part of Vars has
varnullingrels and part of them - doesn't.
Also, I think this is a good example that an explain analyse summary
could have some sort of extended statistics usage report. It can help to
clearly identify cases when extended statistics don't work, but should.
- something like already implemented in SQL Server.
--
regards, Andrei Lepikhov
From | Date | Subject | |
---|---|---|---|
Next Message | Vitaly Davydov | 2024-12-25 16:18:18 | Re: An improvement of ProcessTwoPhaseBuffer logic |
Previous Message | Heikki Linnakangas | 2024-12-25 16:11:34 | Re: adjust_limit_rows_costs algorithm |