Re: *very* slow query to summarize data for a month ...

From: "Patrick Hatcher" <PHatcher(at)macys(dot)com>
To: scrappy(at)postgresql(dot)org
Cc: pgsql-performance(at)postgresql(dot)org, pgsql-performance-owner(at)postgresql(dot)org
Subject: Re: *very* slow query to summarize data for a month ...
Date: 2003-11-10 20:31:08
Message-ID: OF43AE737E.7548A814-ON88256DDA.00707536-88256DDA.00715E91@fds.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Do you have an index on ts.bytes? Josh had suggested this and after I put
it on my summed fields, I saw a speed increase. I can't remember the
article was that Josh had written about index usage, but maybe he'll chime
in and supply the URL for his article.
hth

Patrick Hatcher


"Marc G. Fournier"
<scrappy(at)postgresql
.org> To
Sent by: pgsql-performance(at)postgresql(dot)org
pgsql-performance-o cc
wner(at)postgresql(dot)org
Subject
[PERFORM] *very* slow query to
11/10/2003 12:18 PM summarize data for a month ...





Table structure is simple:

CREATE TABLE traffic_logs (
company_id bigint,
ip_id bigint,
port integer,
bytes bigint,
runtime timestamp without time zone
);

runtime is 'day of month' ...

I need to summarize the month, per company, with a query as:

explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS
total_traffic
FROM company c, traffic_logs ts
WHERE c.company_id = ts.company_id
AND month_trunc(ts.runtime) = '2003-10-01'
GROUP BY company_name,ts.company_id;

and the explain looks like:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=32000.94..32083.07 rows=821 width=41) (actual
time=32983.36..47586.17 rows=144 loops=1)
-> Group (cost=32000.94..32062.54 rows=8213 width=41) (actual
time=32957.40..42817.88 rows=462198 loops=1)
-> Sort (cost=32000.94..32021.47 rows=8213 width=41) (actual
time=32957.38..36261.31 rows=462198 loops=1)
Sort Key: c.company_name, ts.company_id
-> Merge Join (cost=31321.45..31466.92 rows=8213 width=41)
(actual time=13983.07..22642.14 rows=462198 loops=1)
Merge Cond: ("outer".company_id = "inner".company_id)
-> Sort (cost=24.41..25.29 rows=352 width=25)
(actual time=5.52..7.40 rows=348 loops=1)
Sort Key: c.company_id
-> Seq Scan on company c (cost=0.00..9.52
rows=352 width=25) (actual time=0.02..2.78 rows=352 loops=1)
-> Sort (cost=31297.04..31317.57 rows=8213 width=16)
(actual time=13977.49..16794.41 rows=462198 loops=1)
Sort Key: ts.company_id
-> Index Scan using tl_month on traffic_logs ts
(cost=0.00..30763.02 rows=8213 width=16) (actual time=0.29..5562.25
rows=462198 loops=1)
Index Cond: (month_trunc(runtime)
= '2003-10-01 00:00:00'::timestamp without time zone)
Total runtime: 47587.82 msec
(14 rows)

the problem is that we're only taking a few months worth of data, so I
don't think there is much of a way of 'improve performance' on this, but
figured I'd ask quickly before I do something rash ...

Note that without the month_trunc() index, the Total runtime more then
doubles:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=39578.63..39660.76 rows=821 width=41) (actual
time=87805.47..101251.35 rows=144 loops=1)
-> Group (cost=39578.63..39640.23 rows=8213 width=41) (actual
time=87779.56..96824.56 rows=462198 loops=1)
-> Sort (cost=39578.63..39599.17 rows=8213 width=41) (actual
time=87779.52..90781.48 rows=462198 loops=1)
Sort Key: c.company_name, ts.company_id
-> Merge Join (cost=38899.14..39044.62 rows=8213 width=41)
(actual time=64073.98..72783.68 rows=462198 loops=1)
Merge Cond: ("outer".company_id = "inner".company_id)
-> Sort (cost=24.41..25.29 rows=352 width=25)
(actual time=64.66..66.55 rows=348 loops=1)
Sort Key: c.company_id
-> Seq Scan on company c (cost=0.00..9.52
rows=352 width=25) (actual time=1.76..61.70 rows=352 loops=1)
-> Sort (cost=38874.73..38895.27 rows=8213 width=16)
(actual time=64009.26..66860.71 rows=462198 loops=1)
Sort Key: ts.company_id
-> Seq Scan on traffic_logs ts
(cost=0.00..38340.72 rows=8213 width=16) (actual time=5.02..-645982.04
rows=462198 loops=1)
Filter: (date_trunc('month'::text,
runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
Total runtime: 101277.17 msec
(14 rows)

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Patrick Hatcher 2003-11-10 20:32:12 Re: *very* slow query to summarize data for a month ...
Previous Message Marc G. Fournier 2003-11-10 20:18:52 *very* slow query to summarize data for a month ...