Quick Links

Re: ERROR: corrupt MVNDistinct entry

From:	Richard Guo <guofenglinux(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ERROR: corrupt MVNDistinct entry
Date:	2024-12-25 07:49:30
Message-ID:	CAMbWs4_Qdhu+tX6KmbzRnRuJVpQs1T+wgVwsy2BFcY1EQgMQCg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Dec 24, 2024 at 5:00 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> It seems to me that when estimating the number of groups, we do not
> need to concern ourselves with the outer joins that could null the
> Vars/PHVs contained in the grouping expressions, and we should not
> count the same Var more than once.

Furthermore, accounting for nullingrels in the grouping expressions
could prevent us from matching a grouping expression to expressional
index columns or to the expressions in extended statistics, as these
expressions are not decorated with any nullingrels bits. As an
example, consider

create table t (a int, b int);
insert into t select i%10, i%10 from generate_series(1,1000)i;
create statistics s (ndistinct) on (a+b), (a-b) from t;
analyze;

-- after v16
explain (costs on)
select count(*) from t t1 left join t t2 on t1.a = t2.a group by t2.a+t2.b;
QUERY PLAN
--------------------------------------------------------------------------
HashAggregate (cost=1920.00..1921.25 rows=100 width=12)
Group Key: (t2.a + t2.b)
-> Hash Left Join (cost=27.50..1420.00 rows=100000 width=4)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=4)
-> Hash (cost=15.00..15.00 rows=1000 width=8)
-> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8)
(7 rows)

-- before v16
explain (costs on)
select count(*) from t t1 left join t t2 on t1.a = t2.a group by t2.a+t2.b;
QUERY PLAN
--------------------------------------------------------------------------
HashAggregate (cost=1920.00..1920.12 rows=10 width=12)
Group Key: (t2.a + t2.b)
-> Hash Left Join (cost=27.50..1420.00 rows=100000 width=4)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=4)
-> Hash (cost=15.00..15.00 rows=1000 width=8)
-> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8)
(7 rows)

Attached is a more formal patch to strip out all the nullingrels from
the grouping expressions before we estimate number of groups.

Thanks
Richard

Attachment	Content-Type	Size
v1-0001-Do-not-account-for-nullingrels-when-estimating-number-of-groups.patch	application/octet-stream	4.2 KB

In response to

ERROR: corrupt MVNDistinct entry at 2024-12-24 08:00:57 from Richard Guo

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Richard Guo	2024-12-25 08:14:53	Re: ERROR: corrupt MVNDistinct entry
Previous Message	Hayato Kuroda (Fujitsu)	2024-12-25 07:12:03	RE: Object identifier types in logical replication binary mode