From: | "Guy Rouillier" <guyr(at)masergy(dot)com> |
---|---|
To: | "PostgreSQL General" <pgsql-general(at)postgresql(dot)org> |
Subject: | Select distinct sorting all rows 8.0rc1 |
Date: | 2004-12-05 06:07:39 |
Message-ID: | CC1CF380F4D70844B01D45982E671B2348E46A@mtxexch01.add0.masergy.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
System is 4-way Opteron 844, 16 GB memory, SCSI. This is a trial run
for converting an Oracle DB, so system is not optimized. I have
shared_buffers = 50000
work_mem = 65536
A table has about 65 million rows (data collection system.) It has a
primary key, no other indexes, no OIDs. The primary key contains 5
columns, of which service_id is the first (i.e., higher order.) I've
run a simple ANALYZE on this table. Trying to find the unique
service_id values, I did the following. If I'm reading this right, the
sequential scan is passing all 65 million retrieved values onto the
sort, which understandably takes a long time.
This query won't be run very often, if ever, in production, so I don't
want to add additional indexes to support it. Our known queries run
very fast, about 1.2 seconds the first time through, and 20 msecs on
repeats. Can I run ANALYZE in a different way so that queries like this
can be completed in a shorter amount of time? I'm trying to anticipate
ad-hoc queries the user community might come up with.
Since there are such a small number or result rows compared to the
number of total rows, perhaps using a hash table to record unique values
would avoid passing all 65 million rows to the sort. Thanks for all
suggestions.
estat=> explain analyze select distinct(service_id) from
five_min_stats_200408;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----
Unique (cost=13578354.70..13894902.76 rows=726 width=12) (actual
time=1227906.271..1282110.055 rows=879 loops=1)
-> Sort (cost=13578354.70..13736628.73 rows=63309612 width=12)
(actual time=1227906.266..1255961.318 rows=63359396 loops=1)
Sort Key: service_id
-> Seq Scan on five_min_stats_200408 (cost=0.00..1668170.12
rows=63309612 width=12) (actual time=0.061..80398.222 rows=63359396
loops=1)
Total runtime: 1284212.556 ms
(5 rows)
Time: 1284213.359 ms
--
Guy Rouillier
From | Date | Subject | |
---|---|---|---|
Next Message | John Hansen | 2004-12-05 07:46:01 | Re: List archives search function broken |
Previous Message | Stephan Szabo | 2004-12-05 05:44:17 | Re: Trigger problem |