From: | Danil Anisimow <anisimow(dot)d(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Estimation rows of FULL JOIN |
Date: | 2023-12-20 17:31:51 |
Message-ID: | CABm2Ma680cVOW0qJmAME0wvWTAL9ZtQC4GH1f6OHRo5xRLL+1Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi.
In some cases, the planner underestimates FULL JOIN.
Example:
postgres=# CREATE TABLE t AS SELECT x AS a, null AS b FROM
generate_series(1, 10) x;
postgres=# ANALYZE;
postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2
ON t1.b = t2.b;
QUERY PLAN
-------------------------------------------------------------------------------------------
Hash Full Join (cost=1.23..2.37 rows=10 width=72) (actual rows=20 loops=1)
Hash Cond: (t1.b = t2.b)
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=36) (actual rows=10
loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=36) (actual
rows=10 loops=1)
Planning Time: 0.067 ms
Execution Time: 0.052 ms
(8 rows)
Are these simple changes enough to improve this situation?
diff --git a/src/backend/optimizer/path/costsize.c
b/src/backend/optimizer/path/costsize.c
index ef475d95a18..9cd43b778f3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5259,6 +5259,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
break;
case JOIN_FULL:
nrows = outer_rows * inner_rows * fkselec * jselec;
+ if (2 * nrows < outer_rows + inner_rows)
+ nrows = outer_rows + inner_rows - nrows;
if (nrows < outer_rows)
nrows = outer_rows;
if (nrows < inner_rows)
There is no error in the above case:
postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2
ON t1.b = t2.b;
QUERY PLAN
-------------------------------------------------------------------------------------------
Hash Full Join (cost=1.23..2.37 rows=20 width=72) (actual rows=20 loops=1)
Hash Cond: (t1.b = t2.b)
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=36) (actual rows=10
loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=36) (actual
rows=10 loops=1)
Planning Time: 0.069 ms
Execution Time: 0.065 ms
(8 rows)
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-12-20 17:39:05 | Re: Estimation rows of FULL JOIN |
Previous Message | Andrew Dunstan | 2023-12-20 17:22:42 | Re: Remove MSVC scripts from the tree |