From: | Dave Steinberg <dave-dated-1101824919(dot)46cd20(at)redterror(dot)net> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | PG7.4.5: query not using index on date column |
Date: | 2004-11-23 14:28:16 |
Message-ID: | 20041123092816.51d7aa5e.dave@redterror.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hi Folks, I was hoping someone could help me to improve the performance of a query I've got that insists on doing a seq. scan on a large table. I'm trying to do some reporting based on my spam logs which I've partly digested and stored in a table. Here are the particulars:
The messages table:
Column | Type | Modifiers
-------------------+-----------------------+--------------
message_id | integer | not null default nextval('spamreport.messages_message_id_seq'::text)
received_date | date | not null
score | numeric | not null
user_threshold | numeric | not null
raw_email_address | character varying(64) | not null
processing_time | numeric | not null
size | integer | not null
fuzzed_address | character varying(64) | not null
domain | character varying(64) | not null
Indexes:
"messages_pkey" primary key, btree (message_id)
"domain_idx" btree ("domain")
"fuzzy_idx" btree (fuzzed_address)
"received_date_idx" btree (received_date)
And here's the primary query I run, along with explain analyze output:
>> explain analyze SELECT * FROM ( SELECT
domain,
count(*) as count,
max(score) as max_score,
avg(score) as average_score,
stddev(score) as stddev_score,
sum(CASE WHEN score > user_threshold THEN 1 ELSE 0 END) as spam_count,
avg(processing_time) as average_time,
avg(size) as average_size
FROM messages
WHERE received_date BETWEEN '2004-11-01' AND '2004-11-30'
GROUP BY domain ) AS aggs
ORDER BY count DESC;
QUERY PLAN
-------------------------------------------------------------------
Sort (cost=30303.51..30303.60 rows=35 width=226) (actual time=29869.716..29869.883 rows=69 loops=1)
Sort Key: count
-> Subquery Scan aggs (cost=30301.56..30302.61 rows=35 width=226) (actual time=29861.705..29869.240 rows=69 loops=1)
-> HashAggregate (cost=30301.56..30302.26 rows=35 width=54) (actual time=29861.681..29868.261 rows=69 loops=1)
-> Seq Scan on messages (cost=0.00..21573.04 rows=436426 width=54) (actual time=5.523..6304.657 rows=462931 loops=1)
Filter: ((received_date >= '2004-11-01'::date) AND (received_date <= '2004-11-30'::date))
Total runtime: 29870.437 ms
This database gets vacuumed nightly. Also, the query plan stays the same even if I restrict the received_date column down to a single day.
Thanks in advance,
--
Dave Steinberg
http://www.geekisp.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Vishal Kashyap @ [SaiHertz] | 2004-11-23 18:39:29 | Re: Image Insert Doubt |
Previous Message | Premsun Choltanwanich | 2004-11-23 09:45:21 | Re: Image Insert Postgresql DB |