Re: Filtering before join with date_trunc()

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Phil Endecott <spam_from_pgsql_lists(at)chezphil(dot)org>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Filtering before join with date_trunc()
Date: 2018-10-15 16:22:24
Message-ID: CA+bJJbyBWWM9zKoRrybMMRKUkAU=9svqs+CvJYg2-Bwmh+qdBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Phil:

On Mon, Oct 15, 2018 at 5:57 PM, Phil Endecott
<spam_from_pgsql_lists(at)chezphil(dot)org> wrote:
...
> For each of these tables I have a view which rounds the timestamp
> to the nearest minute, and ensures there is only one row per minute:
...
> I then join these tables on the rounded time:
....
> Now I'd like to find the values for a particular short time period:

For what I propose I assume the SHORT time is really short....

...
> This works, but it is inefficient; it seems to create all the rounded
> data, do the join on all of it, and then filter on the time period.
> Ideally it would filter the raw data, and then need to round and join
> far fewer rows.

I think you are expecting too much, I mean, you are expecting the
server to know it can expand your time-period into a (maybe) bigger
one which covers the original data and push that condition down. In my
experience this is normally too much.

...

> Is there anything I can do to make this more efficient when the tables
> are larger?

If your periods are really short you could try to replace the time
condition on the views to a time condition in the tables and do the
rounding and grouping afterwards. I mean, use a half-open interval to
catch the relevant chunks of the tables and then join the short
results.

I think with a trio of CTE selecting with a time interval on the WHERE
and doing the date_trunc()/MAX() group by you should be able to do
three index scans producing short results which can then be
full-joined and coalesced. If you want the interval from $A to $B (
rounded to minutes ), do something like....

WITH pressures AS (
SELECT date_trunc('minute'::text, tbl."time") AS "time",
max(tbl.pressure) AS pressure
FROM tbl
-- Chop the relevant time..
WHERE time >= $A and time < $B + '1 minute'::interval
-- There may be easier ways to make the above condition if you are
generating the text, but always use half-open
GROUP BY 1 ), -- Easier to type, and no order-by here ( and I
normally label order by in views as a code smell )....
,
yada, yada....-- repeat for temperatures, rain

SELECT
SELECT COALESCE(rain."time", pressures."time", temperatures."time") AS "time",
rain.rain,
pressures.pressure,
temperatures.temperature
FROM rain
FULL JOIN pressures USING ("time")
FULL JOIN temperatures USING ("time")
ORDER BY 1; -- ORDER BY GOES HERE, I suspect your example got ordered
by chance, not by dessign.

( use other names, I just used the view names for C&P, lazy me ).

Francisco Olarte.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2018-10-15 16:22:55 Re: Regarding varchar max length in postgres
Previous Message Adrian Klaver 2018-10-15 16:18:17 Re: Regarding varchar max length in postgres