Re: Filtering before join with date_trunc()

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Phil Endecott <spam_from_pgsql_lists(at)chezphil(dot)org>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Filtering before join with date_trunc()
Date: 2018-10-15 16:30:09
Message-ID: fcab9054-e2b3-fc08-3f57-80c7e0bff297@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/15/18 8:57 AM, Phil Endecott wrote:
> Dear Experts,
>
> I have a few tables with "raw" timestamsps like this:
>
> +-------------------------------+----------+
> | time | pressure |
> +-------------------------------+----------+
> | 2018-09-14 00:00:07.148378+00 | 1007.52 |
> | 2018-09-14 00:10:07.147506+00 | 1007.43 |
> | 2018-09-14 00:20:07.147533+00 | 1007.28 |
> +-------------------------------+----------+
>
> For each of these tables I have a view which rounds the timestamp
> to the nearest minute, and ensures there is only one row per minute:
>
> SELECT date_trunc('minute'::text, tbl."time") AS "time",
> max(tbl.pressure) AS pressure
> FROM tbl
> GROUP BY (date_trunc('minute'::text, tbl."time"))
> ORDER BY (date_trunc('minute'::text, tbl."time"));
>
> I then join these tables on the rounded time:
>
> SELECT COALESCE(rain."time", pressures."time", temperatures."time") AS "time",
> rain.rain,
> pressures.pressure,
> temperatures.temperature
> FROM rain
> FULL JOIN pressures USING ("time")
> FULL JOIN temperatures USING ("time");
>
> +------------------------+------+----------+-------------+
> | time | rain | pressure | temperature |
> +------------------------+------+----------+-------------+
> | 2018-09-14 00:00:00+00 | 0 | 1007.52 | 11.349 |
> | 2018-09-14 00:10:00+00 | 0 | 1007.43 | 11.2317 |
> | 2018-09-14 00:20:00+00 | 0 | 1007.28 | 11.2317 |
> +------------------------+------+----------+-------------+
>
> The COALESCE for time and the full joins are needed because some
> columns may be missing for some minutes.
>
> Now I'd like to find the values for a particular short time period:
>
> SELECT * FROM readings
> WHERE "time" BETWEEN '2018-10-01T00:00:00' AND '2018-10-01T24:00:00'

Is readings a table or view?

If view is the SELECT COALESCE ... query the view query?

>
> This works, but it is inefficient; it seems to create all the rounded
> data, do the join on all of it, and then filter on the time period.
> Ideally it would filter the raw data, and then need to round and join
> far fewer rows.
>
> It would not be difficult for me to round the timestamps when inserting
> the data, and also ensure that there is only one row for each minute.
> But I've done some experiments and even if I remove all the rounding and
> replace the full joins with regular joins, it still does sequential
> scans on at least one of the tables:
>
> Nested Loop (cost=12.95..144.99 rows=135 width=20)
> Join Filter: (x_rain."time" = x_pressures."time")
> -> Hash Join (cost=12.67..97.83 rows=135 width=24)
> Hash Cond: (x_temperatures."time" = x_rain."time")
> -> Seq Scan on x_temperatures (cost=0.00..67.50 rows=4350 width=12)
> -> Hash (cost=10.98..10.98 rows=135 width=12)
> -> Index Scan using x_rain_by_time on x_rain (cost=0.28..10.98 rows=135 width=12)
> Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND ("time" <= '2018-10-02 00:00:00+00'::timestamp with time zone))
> -> Index Scan using x_pressures_by_time on x_pressures (cost=0.28..0.34 rows=1 width=12)
> Index Cond: ("time" = x_temperatures."time")
>
> Maybe that is because the tables are currently relatively small (a
> few thousands rows) and it believes that sequential scans are faster.
> (I have sometimes wished for an "explain" variant that tells me what
> query plan it would choose if all the tables were 100X larger.)
>
> Is there anything I can do to make this more efficient when the tables
> are larger?
>
>
> Thanks for any suggestions.
>
>
> Regards, Phil.
>
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-10-15 16:45:52 Re: Filtering before join with date_trunc()
Previous Message Adrian Klaver 2018-10-15 16:22:55 Re: Regarding varchar max length in postgres