From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ERROR: corrupt MVNDistinct entry |
Date: | 2024-12-27 04:16:07 |
Message-ID: | CAMbWs4_euoGhR+ZEYAFLduVw1U+Ww3LY7LRpCZPqdxKg1AYy3Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 25, 2024 at 6:36 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> In v16 and later, the nullingrels within the expression "t2.a + t2.b"
> prevent it from being matched to the corresponding expression in
> extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).
Furthermore, even without extended statistics or expressional index
columns, the nullingrels can still cause issues with selectivity
estimates. They may cause examine_variable to fail in identifying a
valid RelOptInfo for an expression, making the relation size not
available, which is required in many cases. For instance,
create table t (a int, b int);
insert into t select i, i from generate_series(1,10)i;
analyze t;
-- in v16 and later
explain (costs on)
select * from t t1
left join t t2 on true
left join t t3 on t1.a = t3.a and t3.a < 8
where t1.a = coalesce(t2.a);
QUERY PLAN
-----------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..4.94 rows=1 width=24)
Join Filter: (t1.a = t3.a)
-> Nested Loop Left Join (cost=0.00..3.73 rows=1 width=16)
Filter: (t1.a = COALESCE(t2.a))
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=8)
-> Materialize (cost=0.00..1.15 rows=10 width=8)
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=8)
-> Seq Scan on t t3 (cost=0.00..1.12 rows=7 width=8)
Filter: (a < 8)
(9 rows)
-- in v15 and before
explain (costs on)
select * from t t1
left join t t2 on true
left join t t3 on t1.a = t3.a and t3.a < 8
where t1.a = coalesce(t2.a);
QUERY PLAN
-----------------------------------------------------------------------
Hash Left Join (cost=1.21..5.04 rows=10 width=24)
Hash Cond: (t1.a = t3.a)
-> Nested Loop Left Join (cost=0.00..3.73 rows=10 width=16)
Filter: (t1.a = COALESCE(t2.a))
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=8)
-> Materialize (cost=0.00..1.15 rows=10 width=8)
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=8)
-> Hash (cost=1.12..1.12 rows=7 width=8)
-> Seq Scan on t t3 (cost=0.00..1.12 rows=7 width=8)
Filter: (a < 8)
(10 rows)
In v16 and later, when calculating the join selectivity for "t1.a =
coalesce(t2.a)", eqjoinsel sets nd2 — the number of distinct values of
coalesce(t2.a) — to DEFAULT_NUM_DISTINCT (200) because the
corresponding RelOptInfo is not identifiable. This results in very
inaccurate join selectivity.
I'm wondering if we also need to strip out the nullingrels from the
expression in examine_variable(). I tried doing so and noticed a plan
diff in regression test join.sql.
@@ -2573,10 +2573,11 @@
-> Materialize
-> Seq Scan on int4_tbl t2
Filter: (f1 > 1)
- -> Seq Scan on int4_tbl t3
+ -> Materialize
+ -> Seq Scan on int4_tbl t3
-> Materialize
-> Seq Scan on int4_tbl t4
-(13 rows)
+(14 rows)
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2024-12-27 04:42:22 | Re: date_trunc invalid units with infinite value |
Previous Message | vignesh C | 2024-12-27 04:11:00 | Re: Logical replication timeout |