From: | "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au> |
---|---|
To: | "Michael Contzen" <mcontzen(at)dohle(dot)com>, <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: How slow is DISTINCT? |
Date: | 2002-04-16 01:53:51 |
Message-ID: | GNELIHDDFBOCMGBFGEFOIECFCCAA.chriskl@familyhealth.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
What happens if you do...
select artnr from warenausg_ges group by artnr;
or even
select distinct (artnr) from warenausg_ges;
or
select distinct on (artnr) from warenausg_ges;
...sort of ideas...
Another idea is to maintain a table of the unique values in the column.
Create a trigger on the main table to maintian the list of unique values...
Chris
> -----Original Message-----
> From: pgsql-sql-owner(at)postgresql(dot)org
> [mailto:pgsql-sql-owner(at)postgresql(dot)org]On Behalf Of Michael Contzen
> Sent: Tuesday, 9 April 2002 6:16 PM
> To: pgsql-sql(at)postgresql(dot)org
> Subject: Re: [SQL] How slow is DISTINCT?
>
>
> Hello,
>
> our problem to 'select distinct' is similar: We have a big table of
> about 320.000.000 rows of an datawarehouse application. 55 GB, yes, it
> works fine!
>
> Createing an index on it quite fast (nearly as fast than oracle on the
> same data and same machine) 1GHz P4, 4GB-Ram -> 30 Minutes.
>
> Then we tried 'select distinct one_field' which would result to about
> 200.000 different values. Postgres needed 6 hours while Oracle managed
> it in about 30 minutes.
>
> Looking into the pgsql_tmp directory of this db while doing this
> selection showed me a lot of tempfiles nearly as big as the table.
>
> Does postgres sort the whole table without projection to one column an
> performs a unique on this whole table?
> This would explain the big amount of disk usage in pgsql_tmp and the big
> amount of time.
>
> The statement, something could be wrong with the data, is not very
> useful: This is data of our electronic cash-desks. Unfortunately our
> customers buy every day nearly the same articles - therefor the
> repeatition of data :-)
>
>
> Kind regards
>
> M.Contzen
> Developer
> Dohle Systemberatung
> Germany
>
>
> Some facts of our test:
>
> Table "warenausg_ges"
> Column | Type | Modifiers
> -----------+---------------+-----------
> ean | numeric(13,0) |
> menge | numeric(13,3) |
> lvkumsatz | numeric(15,3) |
> vkumsatz | numeric(15,3) |
> ekumsatz | numeric(15,3) |
> rabatt | numeric(12,0) |
> kdnr | numeric(10,0) |
> artnr | numeric(10,0) |
> lfnr | numeric(10,0) |
> wg | integer |
> aktion | character(1) |
> datum | date |
> status | integer |
> Indexes: warenausg_ges_inx
>
> Index "warenausg_ges_inx"
> Column | Type
> --------+------
> datum | date
> btree
>
> explain select distinct artnr from warenausg_ges;
> NOTICE: QUERY PLAN:
>
> Unique (cost=224522801.22..225315849.86 rows=31721946 width=12)
> -> Sort (cost=224522801.22..224522801.22 rows=317219456 width=12)
> -> Seq Scan on warenausg_ges (cost=0.00..165793667.00
> rows=317219456 width=12)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2002-04-16 02:51:11 | Re: please advise on column data type |
Previous Message | jack | 2002-04-16 01:25:04 | please advise on column data type |