Filtering before join with date_trunc()

From: "Phil Endecott" <spam_from_pgsql_lists(at)chezphil(dot)org>
To: "Postgres General" <pgsql-general(at)postgresql(dot)org>
Subject: Filtering before join with date_trunc()
Date: 2018-10-15 15:57:41
Message-ID: 1539619061614@dmwebmail.dmwebmail.chezphil.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dear Experts,

I have a few tables with "raw" timestamsps like this:

+-------------------------------+----------+
| time | pressure |
+-------------------------------+----------+
| 2018-09-14 00:00:07.148378+00 | 1007.52 |
| 2018-09-14 00:10:07.147506+00 | 1007.43 |
| 2018-09-14 00:20:07.147533+00 | 1007.28 |
+-------------------------------+----------+

For each of these tables I have a view which rounds the timestamp
to the nearest minute, and ensures there is only one row per minute:

SELECT date_trunc('minute'::text, tbl."time") AS "time",
max(tbl.pressure) AS pressure
FROM tbl
GROUP BY (date_trunc('minute'::text, tbl."time"))
ORDER BY (date_trunc('minute'::text, tbl."time"));

I then join these tables on the rounded time:

SELECT COALESCE(rain."time", pressures."time", temperatures."time") AS "time",
rain.rain,
pressures.pressure,
temperatures.temperature
FROM rain
FULL JOIN pressures USING ("time")
FULL JOIN temperatures USING ("time");

+------------------------+------+----------+-------------+
| time | rain | pressure | temperature |
+------------------------+------+----------+-------------+
| 2018-09-14 00:00:00+00 | 0 | 1007.52 | 11.349 |
| 2018-09-14 00:10:00+00 | 0 | 1007.43 | 11.2317 |
| 2018-09-14 00:20:00+00 | 0 | 1007.28 | 11.2317 |
+------------------------+------+----------+-------------+

The COALESCE for time and the full joins are needed because some
columns may be missing for some minutes.

Now I'd like to find the values for a particular short time period:

SELECT * FROM readings
WHERE "time" BETWEEN '2018-10-01T00:00:00' AND '2018-10-01T24:00:00'

This works, but it is inefficient; it seems to create all the rounded
data, do the join on all of it, and then filter on the time period.
Ideally it would filter the raw data, and then need to round and join
far fewer rows.

It would not be difficult for me to round the timestamps when inserting
the data, and also ensure that there is only one row for each minute.
But I've done some experiments and even if I remove all the rounding and
replace the full joins with regular joins, it still does sequential
scans on at least one of the tables:

Nested Loop (cost=12.95..144.99 rows=135 width=20)
Join Filter: (x_rain."time" = x_pressures."time")
-> Hash Join (cost=12.67..97.83 rows=135 width=24)
Hash Cond: (x_temperatures."time" = x_rain."time")
-> Seq Scan on x_temperatures (cost=0.00..67.50 rows=4350 width=12)
-> Hash (cost=10.98..10.98 rows=135 width=12)
-> Index Scan using x_rain_by_time on x_rain (cost=0.28..10.98 rows=135 width=12)
Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND ("time" <= '2018-10-02 00:00:00+00'::timestamp with time zone))
-> Index Scan using x_pressures_by_time on x_pressures (cost=0.28..0.34 rows=1 width=12)
Index Cond: ("time" = x_temperatures."time")

Maybe that is because the tables are currently relatively small (a
few thousands rows) and it believes that sequential scans are faster.
(I have sometimes wished for an "explain" variant that tells me what
query plan it would choose if all the tables were 100X larger.)

Is there anything I can do to make this more efficient when the tables
are larger?

Thanks for any suggestions.

Regards, Phil.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Benoit Lobréau 2018-10-15 16:09:55 Re: Setting up continuous archiving
Previous Message Durgamahesh Manne 2018-10-15 15:56:12 Re: Regarding varchar max length in postgres