Better way to compute moving averages?

From: Jason Aleski <jason(dot)aleski(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Better way to compute moving averages?
Date: 2015-01-23 22:58:51
Message-ID: 54C2D22B.8050605@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I've been asked compute various moving averages of end of day sales by
store. I can do this for all rows with no problem (same query without
the WHERE clause towards the end). That query took 10-15 minutes to run
over approx 3.4 million rows. I'm sure they will want this information
to be added to the daily end of day reports. I can run the query below
(excluding the WHERE clause) but it takes almost as long to run one day
as it does the entire dataset. It looks like when I do the inner
select, it is still running over the entire dataset. I have added a
"WHERE eod_ts > CURRENT_TIMESTAMP - INTERVAL '365 days'" (as below) to
the inner query, which allows the query to run between 1-2 minutes.

Question 1) This seems to work, but was curious if there is a better way.

Question 2) Is there a way to specify a date, instead of using current
date and current_timestamp, as a variable and use that in the query? I
know I can do that in my Java program
using variables, but wasn't sure if there was a way to do this with a
function or stored procedure?

INSERT INTO historical_data_avg (store_id, date, avg7sales, avg14sales,
avg30sales, avg60sales, avg90sales, avg180sales) (
SELECT t1.store_id, t1.eod_ts, t1.avg5sales, t1.avg10sales,
t1.avg20sales, t1.avg50sales, t1.avg100sales, t1.avg180sales FROM (
SELECT
store_id,
eod_ts,
avg(eod_sales) OVER (PARTITION BY store_id ORDER BY eod_ts DESC
ROWS BETWEEN CURRENT ROW AND 4 FOLLOWING) AS avg5sales,
avg(eod_sales) OVER (PARTITION BY store_id ORDER BY eod_ts DESC
ROWS BETWEEN CURRENT ROW AND 9 FOLLOWING) AS avg10sales,
avg(eod_sales) OVER (PARTITION BY store_id ORDER BY eod_ts DESC
ROWS BETWEEN CURRENT ROW AND 19 FOLLOWING) AS avg20sales,
avg(eod_sales) OVER (PARTITION BY store_id ORDER BY eod_ts DESC
ROWS BETWEEN CURRENT ROW AND 49 FOLLOWING) AS avg50sales,
avg(eod_sales) OVER (PARTITION BY store_id ORDER BY eod_ts DESC
ROWS BETWEEN CURRENT ROW AND 99 FOLLOWING) AS avg100sales,
avg(eod_sales) OVER (PARTITION BY store_id ORDER BY eod_ts DESC
ROWS BETWEEN CURRENT ROW AND 179 FOLLOWING) AS avg200sales
FROM end_of_day_data
WHERE eod_ts > CURRENT_TIMESTAMP - INTERVAL '260 days'
GROUP BY store_id, eod_ts, eod_sales
ORDER BY ticker_id, eod_ts
) as t1
WHERE t1.eod_ts = current_date
);

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message David G Johnston 2015-01-24 00:50:27 Re: Better way to compute moving averages?
Previous Message ktm@rice.edu 2015-01-23 20:06:57 Re: [SQL] commit inside a function failing