From: | "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org> |
---|---|
To: | Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> |
Cc: | "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: *very* slow query to summarize data for a month ... |
Date: | 2003-11-11 18:38:04 |
Message-ID: | 20031111134925.L56037@ganymede.hub.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, 11 Nov 2003, Dennis Bjorklund wrote:
> On Mon, 10 Nov 2003, Marc G. Fournier wrote:
>
> >
> > explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
> > FROM company c, traffic_logs ts
> > WHERE c.company_id = ts.company_id
> > AND month_trunc(ts.runtime) = '2003-10-01'
> > GROUP BY company_name,ts.company_id;
>
> What if you do
>
> ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01'
>
> and add an index like (runtime, company_name, company_id)?
Good thought, but even simplifying it to the *lowest* query possible, with
no table joins, is painfully slow:
explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
FROM traffic_logs ts
WHERE month_trunc(ts.runtime) = '2003-10-01'
GROUP BY ts.company_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=31630.84..31693.05 rows=829 width=16) (actual time=14862.71..26552.39 rows=144 loops=1)
-> Group (cost=31630.84..31672.31 rows=8295 width=16) (actual time=9634.28..20967.07 rows=462198 loops=1)
-> Sort (cost=31630.84..31651.57 rows=8295 width=16) (actual time=9634.24..12838.73 rows=462198 loops=1)
Sort Key: company_id
-> Index Scan using tl_month on traffic_logs ts (cost=0.00..31090.93 rows=8295 width=16) (actual time=0.26..6043.35 rows=462198 loops=1)
Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
Total runtime: 26659.35 msec
(7 rows)
-OR-
explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
FROM traffic_logs ts
WHERE ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01'
GROUP BY ts.company_id;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=81044.53..84424.21 rows=45062 width=16) (actual time=13307.52..29274.66 rows=144 loops=1)
-> Group (cost=81044.53..83297.65 rows=450625 width=16) (actual time=10809.02..-673265.13 rows=462198 loops=1)
-> Sort (cost=81044.53..82171.09 rows=450625 width=16) (actual time=10808.99..14069.79 rows=462198 loops=1)
Sort Key: company_id
-> Seq Scan on traffic_logs ts (cost=0.00..38727.35 rows=450625 width=16) (actual time=0.07..6801.92 rows=462198 loops=1)
Filter: ((runtime >= '2003-10-01 00:00:00'::timestamp without time zone) AND (runtime < '2003-11-01 00:00:00'::timestamp without time zone))
Total runtime: 29385.97 msec
(7 rows)
Just as a side note, just doing a straight scan for the records, with no
SUM()/GROUP BY involved, with the month_trunc() index is still >8k msec:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Index Scan using tl_month on traffic_logs ts (cost=0.00..31096.36 rows=8297 width=16) (actual time=0.96..5432.93 rows=462198 loops=1)
Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
Total runtime: 8092.88 msec
(3 rows)
and without the index, >15k msec:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on traffic_logs ts (cost=0.00..38719.55 rows=8297 width=16) (actual time=0.11..11354.45 rows=462198 loops=1)
Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
Total runtime: 15353.57 msec
(3 rows)
so the GROUP BY is affecting the overall, but even without it, its still
taking a helluva long time ...
I'm going to modify my load script so that it dumps monthly totals to
traffic_logs, and 'details' to a schema.traffic_logs table ... I don't
need the 'per day totals' at the top level at all, only speed ... the 'per
day totals' are only required at the 'per client' level, and by moving the
'per day' into a client schema will shrink the table significantly ...
If it wasn't for trying to pull in that 'whole month' summary, it would be
fine :(
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2003-11-11 19:51:22 | Re: *very* slow query to summarize data for a month ... |
Previous Message | Josh Berkus | 2003-11-11 18:30:39 | Re: *very* slow query to summarize data for a month ... |