Quick Links

Re: SELECT DISTINCT very slow

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-general(at)postgresql(dot)org, Ben Harper <rogojin(at)gmail(dot)com>
Subject:	Re: SELECT DISTINCT very slow
Date:	2009-07-10 00:50:40
Message-ID:	10703.1247187040@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Not really. The OP doesn't say how wide the record rows are but unless
> they're very wide it wouldn't pay to use an index for this even if you
> didn't have to access the heap also. It's going to be faster to scan
> the whole heap and either sort or use a hash. Currently there aren't
> many cases where a btree with 6,000 copies of 111 distinct keys is
> going to be useful.

It was 600,000 not 6,000 ... so a skip-scan might be worth the trouble,
but as you say we haven't done it.

In any case I think the real issue is that the OP is probably using a
pre-8.4 release which will always do SELECT DISTINCT via sort-and-unique.
Hash aggregation would be a whole lot faster for these numbers, even
if not exactly instantaneous. He could update to 8.4, or go over to
using GROUP BY as was recommended upthread.

regards, tom lane

In response to

Re: SELECT DISTINCT very slow at 2009-07-10 00:36:38 from Greg Stark

Responses

Re: SELECT DISTINCT very slow at 2009-07-10 12:41:05 from Ben Harper

Browse pgsql-general by date

	From	Date	Subject
Next Message	serafin segador	2009-07-10 01:00:24	Re: pg_dump PostgreSQL 8.4
Previous Message	Greg Stark	2009-07-10 00:36:38	Re: SELECT DISTINCT very slow