Re: Filtering before join with date_trunc()

From: "Phil Endecott" <spam_from_pgsql_lists(at)chezphil(dot)org>
To: "Postgres General" <pgsql-general(at)postgresql(dot)org>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Filtering before join with date_trunc()
Date: 2018-10-16 02:58:29
Message-ID: 1539658709144@dmwebmail.dmwebmail.chezphil.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks all for the replies.

Tom Lane wrote:
> You're expecting too much.

That often seems to be the case.

> I think you're also expecting the system to deduce that it can apply an
> inequality on one join column to the other one. It doesn't; only equality
> constraints have any sort of transitivity logic.
>
> So you'll need to write out the BETWEEN separately for each table,
> and put it below the full join, which means you won't be able to
> use those nice views :-(

Here's an example:

create table t1 ("time" timestamptz, value1 numeric);
create index t1_time on t1("time");
\copy t1 from ......

create table t2 ("time" timestamptz, value2 numeric);
create index t2_time on t2("time");
\copy t2 from ......

explain select * from t1 join t2 using("time") where "time" between '2018-10-01' and '2018-10-02';
+------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------+
| Hash Join (cost=12.99..101.03 rows=138 width=21) |
| Hash Cond: (t2."time" = t1."time") |
| -> Seq Scan on t2 (cost=0.00..70.11 rows=4411 width=15) |
| -> Hash (cost=11.18..11.18 rows=145 width=14) |
| -> Index Scan using t1_time on t1 (cost=0.28..11.18 rows=145 width=14) |
| Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND .|
|.("time" <= '2018-10-02 00:00:00+00'::timestamp with time zone)) |
+------------------------------------------------------------------------------------------------+

explain with q1 as (select * from t1 where "time" between '2018-10-01' and '2018-10-02'), q2 as (select * from t2 where "time" between '2018-10-01' and '2018-10-02') select * from q1 join q2 using("time");
+------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------+
| Hash Join (cost=26.60..31.41 rows=136 width=72) |
| Hash Cond: (q1."time" = q2."time") |
| CTE q1 |
| -> Index Scan using t1_time on t1 (cost=0.28..11.18 rows=145 width=14) |
| Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND ("ti.|
|.me" <= '2018-10-02 00:00:00+00'::timestamp with time zone)) |
| CTE q2 |
| -> Index Scan using t2_time on t2 (cost=0.28..11.00 rows=136 width=15) |
| Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND ("ti.|
|.me" <= '2018-10-02 00:00:00+00'::timestamp with time zone)) |
| -> CTE Scan on q1 (cost=0.00..2.90 rows=145 width=40) |
| -> Hash (cost=2.72..2.72 rows=136 width=40) |
| -> CTE Scan on q2 (cost=0.00..2.72 rows=136 width=40) |
+------------------------------------------------------------------------------------------------+

So.... as you say, even if I strip out all of the complexity of approximate
timestamps and missing values, it's never going to push the BETWEEN filter
down below the join. Even with just a few thousand rows I see a 5X speedup
with the second query with the explicit filtering below the join.

This is rather disappointing. Am I the only person who's ever wanted to do
this?

Regards, Phil.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ben Madin 2018-10-16 04:01:15 Re: Saving view turns SELECT * into field list
Previous Message Adrian Klaver 2018-10-15 21:25:51 Re: FATAL: terminating connection because protocol synchronization was lost