Quick Links

Re: ERROR: corrupt MVNDistinct entry

From:	Richard Guo <guofenglinux(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ERROR: corrupt MVNDistinct entry
Date:	2024-12-27 04:16:07
Message-ID:	CAMbWs4_euoGhR+ZEYAFLduVw1U+Ww3LY7LRpCZPqdxKg1AYy3Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Dec 25, 2024 at 6:36 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> In v16 and later, the nullingrels within the expression "t2.a + t2.b"
> prevent it from being matched to the corresponding expression in
> extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).

Furthermore, even without extended statistics or expressional index
columns, the nullingrels can still cause issues with selectivity
estimates. They may cause examine_variable to fail in identifying a
valid RelOptInfo for an expression, making the relation size not
available, which is required in many cases. For instance,

create table t (a int, b int);
insert into t select i, i from generate_series(1,10)i;
analyze t;

-- in v16 and later
explain (costs on)
select * from t t1
left join t t2 on true
left join t t3 on t1.a = t3.a and t3.a < 8
where t1.a = coalesce(t2.a);
QUERY PLAN
-----------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..4.94 rows=1 width=24)
Join Filter: (t1.a = t3.a)
-> Nested Loop Left Join (cost=0.00..3.73 rows=1 width=16)
Filter: (t1.a = COALESCE(t2.a))
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=8)
-> Materialize (cost=0.00..1.15 rows=10 width=8)
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=8)
-> Seq Scan on t t3 (cost=0.00..1.12 rows=7 width=8)
Filter: (a < 8)
(9 rows)

-- in v15 and before
explain (costs on)
select * from t t1
left join t t2 on true
left join t t3 on t1.a = t3.a and t3.a < 8
where t1.a = coalesce(t2.a);
QUERY PLAN
-----------------------------------------------------------------------
Hash Left Join (cost=1.21..5.04 rows=10 width=24)
Hash Cond: (t1.a = t3.a)
-> Nested Loop Left Join (cost=0.00..3.73 rows=10 width=16)
Filter: (t1.a = COALESCE(t2.a))
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=8)
-> Materialize (cost=0.00..1.15 rows=10 width=8)
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=8)
-> Hash (cost=1.12..1.12 rows=7 width=8)
-> Seq Scan on t t3 (cost=0.00..1.12 rows=7 width=8)
Filter: (a < 8)
(10 rows)

In v16 and later, when calculating the join selectivity for "t1.a =
coalesce(t2.a)", eqjoinsel sets nd2 — the number of distinct values of
coalesce(t2.a) — to DEFAULT_NUM_DISTINCT (200) because the
corresponding RelOptInfo is not identifiable. This results in very
inaccurate join selectivity.

I'm wondering if we also need to strip out the nullingrels from the
expression in examine_variable(). I tried doing so and noticed a plan
diff in regression test join.sql.

@@ -2573,10 +2573,11 @@
-> Materialize
-> Seq Scan on int4_tbl t2
Filter: (f1 > 1)
- -> Seq Scan on int4_tbl t3
+ -> Materialize
+ -> Seq Scan on int4_tbl t3
-> Materialize
-> Seq Scan on int4_tbl t4
-(13 rows)
+(14 rows)

Thanks
Richard

In response to

Re: ERROR: corrupt MVNDistinct entry at 2024-12-25 09:36:54 from Richard Guo

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2024-12-27 04:42:22	Re: date_trunc invalid units with infinite value
Previous Message	vignesh C	2024-12-27 04:11:00	Re: Logical replication timeout