From: | Michael Contzen <mcontzen(at)dohle(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: How slow is DISTINCT? |
Date: | 2002-04-09 10:15:47 |
Message-ID: | 3CB2BF53.77D41E79@dohle.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hello,
our problem to 'select distinct' is similar: We have a big table of
about 320.000.000 rows of an datawarehouse application. 55 GB, yes, it
works fine!
Createing an index on it quite fast (nearly as fast than oracle on the
same data and same machine) 1GHz P4, 4GB-Ram -> 30 Minutes.
Then we tried 'select distinct one_field' which would result to about
200.000 different values. Postgres needed 6 hours while Oracle managed
it in about 30 minutes.
Looking into the pgsql_tmp directory of this db while doing this
selection showed me a lot of tempfiles nearly as big as the table.
Does postgres sort the whole table without projection to one column an
performs a unique on this whole table?
This would explain the big amount of disk usage in pgsql_tmp and the big
amount of time.
The statement, something could be wrong with the data, is not very
useful: This is data of our electronic cash-desks. Unfortunately our
customers buy every day nearly the same articles - therefor the
repeatition of data :-)
Kind regards
M.Contzen
Developer
Dohle Systemberatung
Germany
Some facts of our test:
Table "warenausg_ges"
Column | Type | Modifiers
-----------+---------------+-----------
ean | numeric(13,0) |
menge | numeric(13,3) |
lvkumsatz | numeric(15,3) |
vkumsatz | numeric(15,3) |
ekumsatz | numeric(15,3) |
rabatt | numeric(12,0) |
kdnr | numeric(10,0) |
artnr | numeric(10,0) |
lfnr | numeric(10,0) |
wg | integer |
aktion | character(1) |
datum | date |
status | integer |
Indexes: warenausg_ges_inx
Index "warenausg_ges_inx"
Column | Type
--------+------
datum | date
btree
explain select distinct artnr from warenausg_ges;
NOTICE: QUERY PLAN:
Unique (cost=224522801.22..225315849.86 rows=31721946 width=12)
-> Sort (cost=224522801.22..224522801.22 rows=317219456 width=12)
-> Seq Scan on warenausg_ges (cost=0.00..165793667.00
rows=317219456 width=12)
From | Date | Subject | |
---|---|---|---|
Next Message | Søren Vainio | 2002-04-09 11:28:38 | Re: Scadinavian characters in regular expressions |
Previous Message | Andreas Joseph Krogh | 2002-04-09 09:53:29 | Re: Scadinavian characters in regular expressions |