From: | Łukasz Dąbek <sznurek(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Using b-tree index for >= condition when joining |
Date: | 2020-05-16 17:56:20 |
Message-ID: | CAE2Xe9tLfLxE_pthPLFerywyA4RxtGTwMCqKpGk6VqMYOj6Lfw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello All!
I am having a problem with nudging postgres to choose a good plan for
a query involving a left join and an inequality constraint on a column
with b-tree index.
Let's say both tbl1 and tbl2 tables have date column with an index on
it. Queries like "SELECT * FROM tbl1 WHERE date >= CONSTANT" are using
index scan, as expected. Now let's define a view:
=# CREATE VIEW vw1 AS SELECT t1.date as date, t1.x as x, t2.y as y
FROM tbl1 t1 LEFT JOIN tbl2 t2 USING (date);
Query of the form "SELECT * FROM vw1 WHERE date = '2020-04-21'" is
using index scan on both tables:
=# EXPLAIN SELECT * FROM vw1 WHERE date = '2020-04-21';
QUERY PLAN
------------------------------------------------------------------------------------------------
Hash Right Join (cost=91208.02..112781024.50 rows=10000000000 width=12)
Hash Cond: (t2.date = t1.date)
-> Index Scan using tbl2_date_idx on tbl2 t2
(cost=0.43..188393.92 rows=100000 width=8)
Index Cond: (date = '2019-04-21'::date)
-> Hash (cost=89566.58..89566.58 rows=100000 width=8)
-> Bitmap Heap Scan on tbl1 t1 (cost=1875.43..89566.58
rows=100000 width=8)
Recheck Cond: (date = '2019-04-21'::date)
-> Bitmap Index Scan on tbl1_date_idx
(cost=0.00..1850.43 rows=100000 width=0)
Index Cond: (date = '2019-04-21'::date)
(I know the total number of rows estimated for this and next queries
is enormous, in reality there are more conditions on the join but I
want to keep the example small)
However when an inequality is used the query plan seems inefficient:
=# EXPLAIN SELECT * FROM vw1 WHERE date >= '2020-04-21';
QUERY PLAN
----------------------------------------------------------------------------------------------
Hash Left Join (cost=483538.43..4617954384.38 rows=410400000000 width=12)
Hash Cond: (t1.date = t2.date)
-> Index Scan using tbl1_date_idx on tbl1 t1
(cost=0.43..369147.38 rows=4104000 width=8)
Index Cond: (date >= '2019-04-21'::date)
-> Hash (cost=234163.00..234163.00 rows=15200000 width=8)
-> Seq Scan on tbl2 t2 (cost=0.00..234163.00 rows=15200000 width=8)
It looks like the inequality on date isn't pushed down below the left
join? I can get the plan I'd like to have by putting the same
constraint on the date column on the second table:
=# EXPLAIN SELECT * FROM tbl1 t1 LEFT JOIN tbl2 t2 USING (date) WHERE
t1.date >= '2019-04-21' AND t2.date >= '2019-04-21';
QUERY PLAN
--------------------------------------------------------------------------------------------------
Hash Join (cost=281625.87..1651822721.88 rows=112860000000 width=26)
Hash Cond: (t2.date = t1.date)
-> Index Scan using tbl2_date_idx on tbl2 t2
(cost=0.43..369784.44 rows=4180000 width=15)
Index Cond: (date >= '2019-04-21'::date)
-> Hash (cost=210285.43..210285.43 rows=4104000 width=15)
-> Bitmap Heap Scan on tbl1 t1 (cost=76822.43..210285.43
rows=4104000 width=15)
Recheck Cond: (date >= '2019-04-21'::date)
-> Bitmap Index Scan on tbl1_date_idx
(cost=0.00..75796.43 rows=4104000 width=0)
Index Cond: (date >= '2019-04-21'::date)
Is it possible to define a view vw2 such that queries of the form
"SELECT * FROM vw2 WHERE date >= CONSTANT" use the plan I pasted
above?
Thanks in advance for help,
Lukasz
From | Date | Subject | |
---|---|---|---|
Next Message | Christian Ramseyer | 2020-05-16 19:59:24 | Re: Removing Last field from CSV string |
Previous Message | Don Seiler | 2020-05-16 17:51:53 | Re: template0 needing vacuum freeze? |