From: | "Phil Endecott" <spam_from_pgsql_lists(at)chezphil(dot)org> |
---|---|
To: | "Postgres General" <pgsql-general(at)postgresql(dot)org> |
Cc: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Filtering before join with date_trunc() |
Date: | 2018-10-16 02:58:29 |
Message-ID: | 1539658709144@dmwebmail.dmwebmail.chezphil.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Thanks all for the replies.
Tom Lane wrote:
> You're expecting too much.
That often seems to be the case.
> I think you're also expecting the system to deduce that it can apply an
> inequality on one join column to the other one. It doesn't; only equality
> constraints have any sort of transitivity logic.
>
> So you'll need to write out the BETWEEN separately for each table,
> and put it below the full join, which means you won't be able to
> use those nice views :-(
Here's an example:
create table t1 ("time" timestamptz, value1 numeric);
create index t1_time on t1("time");
\copy t1 from ......
create table t2 ("time" timestamptz, value2 numeric);
create index t2_time on t2("time");
\copy t2 from ......
explain select * from t1 join t2 using("time") where "time" between '2018-10-01' and '2018-10-02';
+------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------+
| Hash Join (cost=12.99..101.03 rows=138 width=21) |
| Hash Cond: (t2."time" = t1."time") |
| -> Seq Scan on t2 (cost=0.00..70.11 rows=4411 width=15) |
| -> Hash (cost=11.18..11.18 rows=145 width=14) |
| -> Index Scan using t1_time on t1 (cost=0.28..11.18 rows=145 width=14) |
| Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND .|
|.("time" <= '2018-10-02 00:00:00+00'::timestamp with time zone)) |
+------------------------------------------------------------------------------------------------+
explain with q1 as (select * from t1 where "time" between '2018-10-01' and '2018-10-02'), q2 as (select * from t2 where "time" between '2018-10-01' and '2018-10-02') select * from q1 join q2 using("time");
+------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------+
| Hash Join (cost=26.60..31.41 rows=136 width=72) |
| Hash Cond: (q1."time" = q2."time") |
| CTE q1 |
| -> Index Scan using t1_time on t1 (cost=0.28..11.18 rows=145 width=14) |
| Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND ("ti.|
|.me" <= '2018-10-02 00:00:00+00'::timestamp with time zone)) |
| CTE q2 |
| -> Index Scan using t2_time on t2 (cost=0.28..11.00 rows=136 width=15) |
| Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND ("ti.|
|.me" <= '2018-10-02 00:00:00+00'::timestamp with time zone)) |
| -> CTE Scan on q1 (cost=0.00..2.90 rows=145 width=40) |
| -> Hash (cost=2.72..2.72 rows=136 width=40) |
| -> CTE Scan on q2 (cost=0.00..2.72 rows=136 width=40) |
+------------------------------------------------------------------------------------------------+
So.... as you say, even if I strip out all of the complexity of approximate
timestamps and missing values, it's never going to push the BETWEEN filter
down below the join. Even with just a few thousand rows I see a 5X speedup
with the second query with the explicit filtering below the join.
This is rather disappointing. Am I the only person who's ever wanted to do
this?
Regards, Phil.
From | Date | Subject | |
---|---|---|---|
Next Message | Ben Madin | 2018-10-16 04:01:15 | Re: Saving view turns SELECT * into field list |
Previous Message | Adrian Klaver | 2018-10-15 21:25:51 | Re: FATAL: terminating connection because protocol synchronization was lost |