Re: Better way to compute moving averages?

From: David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Better way to compute moving averages?
Date: 2015-01-24 00:50:27
Message-ID: 1422060627285-5835295.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Jason Aleksi wrote
> I've been asked compute various moving averages of end of day sales by
> store. I can do this for all rows with no problem (same query without
> the WHERE clause towards the end). That query took 10-15 minutes to run
> over approx 3.4 million rows. I'm sure they will want this information
> to be added to the daily end of day reports. I can run the query below
> (excluding the WHERE clause) but it takes almost as long to run one day
> as it does the entire dataset. It looks like when I do the inner
> select, it is still running over the entire dataset. I have added a
> "WHERE eod_ts > CURRENT_TIMESTAMP - INTERVAL '365 days'" (as below) to
> the inner query, which allows the query to run between 1-2 minutes.
>
> Question 1) This seems to work, but was curious if there is a better way.

Possibly though I'm not sure you've given enough info...

Regardless, are you positive you need "FOLLOWING" instead of "PRECEDING"
frames?

> Question 2) Is there a way to specify a date, instead of using current
> date and current_timestamp, as a variable and use that in the query? I
> know I can do that in my Java program
> using variables, but wasn't sure if there was a way to do this with a
> function or stored procedure?

Have you read about functions in the documentation?

CREATE FUNCTION echo_text(input_var text)
RETURNS text
AS $$
SELECT input_var;
$$
LANGUAGE sql
STRICT
;

> It looks like when I do the inner
> select, it is still running over the entire dataset.
> I have added a
> "WHERE eod_ts > CURRENT_TIMESTAMP - INTERVAL '365 days'" (as below) to
> the inner query, which allows the query to run between 1-2 minutes.
> [...]
> FROM end_of_day_data
> WHERE eod_ts > CURRENT_TIMESTAMP - INTERVAL '260 days'
> GROUP BY store_id, eod_ts, eod_sales
> ORDER BY ticker_id, eod_ts
> ) as t1
> WHERE t1.eod_ts = current_date
> );

You expected differently? GROUP BY (in particular, not only) prevents any
kind of push-down of quals.

Try something like (not tested or tried - just having an idea this moment):

WITH scope_of_source_with_aging AS (
SELECT eod_ts,
first(eod_ts) OVER () AS ref_eod,
eod_ts - first(eod_ts) OVER (ORDER BY eod_ts) AS eod_age
FROM end_of_day_data WHERE eod_ts >= current_timestamp - interval '206 days'
)
SELECT ref_eod,
avg(CASE WHEN eod_age BETWEEN 0 AND 7 THEN ... ELSE NULL END) AS
average_7day
[, ...]
FROM scope_of_source_with_aging
GROUP BY ref_eod
;

The first, and most important part, is to limit your source data to only the
desired time period. The rest of the above simply tries to limit how much
work needs to be done. Since window function, while great, are fairly slow
if you can minimize their use - in this case by limiting them to simply
calculating and age - it should help performance.

The concept above should be applicable to your situation generally but you
will need to clarify your needs and then modify the WHERE and window clauses
as appropriate.

I also don't recall if there is a native "first" function though you can
either write one yourself or possibly add an appropriate range clause that
will accomplish the same goal.

David J.

--
View this message in context: http://postgresql.nabble.com/Better-way-to-compute-moving-averages-tp5835291p5835295.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Jason Aleski 2015-01-24 02:36:49 Re: Better way to compute moving averages?
Previous Message Jason Aleski 2015-01-23 22:58:51 Better way to compute moving averages?