| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Greg Stark <gsstark(at)mit(dot)edu> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-general(at)postgresql(dot)org, Ben Harper <rogojin(at)gmail(dot)com> |
| Subject: | Re: SELECT DISTINCT very slow |
| Date: | 2009-07-10 00:50:40 |
| Message-ID: | 10703.1247187040@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
Greg Stark <gsstark(at)mit(dot)edu> writes:
> Not really. The OP doesn't say how wide the record rows are but unless
> they're very wide it wouldn't pay to use an index for this even if you
> didn't have to access the heap also. It's going to be faster to scan
> the whole heap and either sort or use a hash. Currently there aren't
> many cases where a btree with 6,000 copies of 111 distinct keys is
> going to be useful.
It was 600,000 not 6,000 ... so a skip-scan might be worth the trouble,
but as you say we haven't done it.
In any case I think the real issue is that the OP is probably using a
pre-8.4 release which will always do SELECT DISTINCT via sort-and-unique.
Hash aggregation would be a whole lot faster for these numbers, even
if not exactly instantaneous. He could update to 8.4, or go over to
using GROUP BY as was recommended upthread.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | serafin segador | 2009-07-10 01:00:24 | Re: pg_dump PostgreSQL 8.4 |
| Previous Message | Greg Stark | 2009-07-10 00:36:38 | Re: SELECT DISTINCT very slow |