Re: ERROR: corrupt MVNDistinct entry

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ERROR: corrupt MVNDistinct entry
Date: 2024-12-25 09:36:54
Message-ID: CAMbWs4-0zvk_K-0eZp238fu9GNOjdZWZ6-0f2d21EQRegU_D9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 25, 2024 at 5:14 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> > 2. It is ok for Vars. But what about expressions? We use equal() in
> > distinct, MCV and dependencies modules. Do we need to remove nulls
> > before using extended statistics as a general rule?
>
> AFAIU, the expressions in extended statistics are not decorated with
> any nullingrels bits, are they?

I've just realized that there may be other places with similar issues,
not just in estimate_num_groups. For instance,

-- after v16
explain (costs on)
select * from t t1 left join t t2 on true where (t2.a+t2.b) is null;
QUERY PLAN
--------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..15032.50 rows=5000 width=16)
Filter: ((t2.a + t2.b) IS NULL)
-> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=8)
-> Materialize (cost=0.00..20.00 rows=1000 width=8)
-> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8)
(5 rows)

-- before v16
explain (costs on)
select * from t t1 left join t t2 on true where (t2.a+t2.b) is null;
QUERY PLAN
--------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..15032.50 rows=1 width=16)
Filter: ((t2.a + t2.b) IS NULL)
-> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=8)
-> Materialize (cost=0.00..20.00 rows=1000 width=8)
-> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8)
(5 rows)

In v16 and later, the nullingrels within the expression "t2.a + t2.b"
prevent it from being matched to the corresponding expression in
extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).

It seems that we need to strip out the nullingrels bits from
expressions before matching them to extended statistics or
expressional index columns in more places.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2024-12-25 09:44:17 Fix typo in comment of compute_return_type()
Previous Message Michael Paquier 2024-12-25 09:28:52 Re: Pass ParseState as down to utility functions.