Re: ERROR: corrupt MVNDistinct entry

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject: Re: ERROR: corrupt MVNDistinct entry
Date: 2024-12-25 16:14:56
Message-ID: 79660add-4238-4616-903e-ba1607f79f6e@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25/12/2024 16:36, Richard Guo wrote:
> On Wed, Dec 25, 2024 at 5:14 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>> On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>>> 2. It is ok for Vars. But what about expressions? We use equal() in
>>> distinct, MCV and dependencies modules. Do we need to remove nulls
>>> before using extended statistics as a general rule?
>>
>> AFAIU, the expressions in extended statistics are not decorated with
>> any nullingrels bits, are they?
>
> I've just realized that there may be other places with similar issues,
> not just in estimate_num_groups. For instance,
I'm pleased to see that you've grasped my initially unclear idea. Yeah,
it seems that all types of statistics may be lost because of varnullingrels.

> In v16 and later, the nullingrels within the expression "t2.a + t2.b"
> prevent it from being matched to the corresponding expression in
> extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).
>
> It seems that we need to strip out the nullingrels bits from
> expressions before matching them to extended statistics or
> expressional index columns in more places.
I think Tomas Vondra may have a decisive opinion in this place: we have
already discussed some approaches to calculate NULLs generated by RHS of
Left Join shortly.
Maybe we can commit a quick cure like the one provided in your patch,
but we should remember this example - it is not apparent to me how to
estimate a group of clauses in the case when part of Vars has
varnullingrels and part of them - doesn't.

Also, I think this is a good example that an explain analyse summary
could have some sort of extended statistics usage report. It can help to
clearly identify cases when extended statistics don't work, but should.
- something like already implemented in SQL Server.

--
regards, Andrei Lepikhov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vitaly Davydov 2024-12-25 16:18:18 Re: An improvement of ProcessTwoPhaseBuffer logic
Previous Message Heikki Linnakangas 2024-12-25 16:11:34 Re: adjust_limit_rows_costs algorithm