Using b-tree index for >= condition when joining

From: Łukasz Dąbek <sznurek(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Using b-tree index for >= condition when joining
Date: 2020-05-16 17:56:20
Message-ID: CAE2Xe9tLfLxE_pthPLFerywyA4RxtGTwMCqKpGk6VqMYOj6Lfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello All!

I am having a problem with nudging postgres to choose a good plan for
a query involving a left join and an inequality constraint on a column
with b-tree index.

Let's say both tbl1 and tbl2 tables have date column with an index on
it. Queries like "SELECT * FROM tbl1 WHERE date >= CONSTANT" are using
index scan, as expected. Now let's define a view:

=# CREATE VIEW vw1 AS SELECT t1.date as date, t1.x as x, t2.y as y
FROM tbl1 t1 LEFT JOIN tbl2 t2 USING (date);

Query of the form "SELECT * FROM vw1 WHERE date = '2020-04-21'" is
using index scan on both tables:

=# EXPLAIN SELECT * FROM vw1 WHERE date = '2020-04-21';
QUERY PLAN
------------------------------------------------------------------------------------------------
Hash Right Join (cost=91208.02..112781024.50 rows=10000000000 width=12)
Hash Cond: (t2.date = t1.date)
-> Index Scan using tbl2_date_idx on tbl2 t2
(cost=0.43..188393.92 rows=100000 width=8)
Index Cond: (date = '2019-04-21'::date)
-> Hash (cost=89566.58..89566.58 rows=100000 width=8)
-> Bitmap Heap Scan on tbl1 t1 (cost=1875.43..89566.58
rows=100000 width=8)
Recheck Cond: (date = '2019-04-21'::date)
-> Bitmap Index Scan on tbl1_date_idx
(cost=0.00..1850.43 rows=100000 width=0)
Index Cond: (date = '2019-04-21'::date)

(I know the total number of rows estimated for this and next queries
is enormous, in reality there are more conditions on the join but I
want to keep the example small)

However when an inequality is used the query plan seems inefficient:

=# EXPLAIN SELECT * FROM vw1 WHERE date >= '2020-04-21';
QUERY PLAN
----------------------------------------------------------------------------------------------
Hash Left Join (cost=483538.43..4617954384.38 rows=410400000000 width=12)
Hash Cond: (t1.date = t2.date)
-> Index Scan using tbl1_date_idx on tbl1 t1
(cost=0.43..369147.38 rows=4104000 width=8)
Index Cond: (date >= '2019-04-21'::date)
-> Hash (cost=234163.00..234163.00 rows=15200000 width=8)
-> Seq Scan on tbl2 t2 (cost=0.00..234163.00 rows=15200000 width=8)

It looks like the inequality on date isn't pushed down below the left
join? I can get the plan I'd like to have by putting the same
constraint on the date column on the second table:

=# EXPLAIN SELECT * FROM tbl1 t1 LEFT JOIN tbl2 t2 USING (date) WHERE
t1.date >= '2019-04-21' AND t2.date >= '2019-04-21';
QUERY PLAN
--------------------------------------------------------------------------------------------------
Hash Join (cost=281625.87..1651822721.88 rows=112860000000 width=26)
Hash Cond: (t2.date = t1.date)
-> Index Scan using tbl2_date_idx on tbl2 t2
(cost=0.43..369784.44 rows=4180000 width=15)
Index Cond: (date >= '2019-04-21'::date)
-> Hash (cost=210285.43..210285.43 rows=4104000 width=15)
-> Bitmap Heap Scan on tbl1 t1 (cost=76822.43..210285.43
rows=4104000 width=15)
Recheck Cond: (date >= '2019-04-21'::date)
-> Bitmap Index Scan on tbl1_date_idx
(cost=0.00..75796.43 rows=4104000 width=0)
Index Cond: (date >= '2019-04-21'::date)

Is it possible to define a view vw2 such that queries of the form
"SELECT * FROM vw2 WHERE date >= CONSTANT" use the plan I pasted
above?

Thanks in advance for help,
Lukasz

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Christian Ramseyer 2020-05-16 19:59:24 Re: Removing Last field from CSV string
Previous Message Don Seiler 2020-05-16 17:51:53 Re: template0 needing vacuum freeze?