Re: SELECT DISTINCT very slow

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-general(at)postgresql(dot)org, Ben Harper <rogojin(at)gmail(dot)com>
Subject: Re: SELECT DISTINCT very slow
Date: 2009-07-10 00:50:40
Message-ID: 10703.1247187040@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Not really. The OP doesn't say how wide the record rows are but unless
> they're very wide it wouldn't pay to use an index for this even if you
> didn't have to access the heap also. It's going to be faster to scan
> the whole heap and either sort or use a hash. Currently there aren't
> many cases where a btree with 6,000 copies of 111 distinct keys is
> going to be useful.

It was 600,000 not 6,000 ... so a skip-scan might be worth the trouble,
but as you say we haven't done it.

In any case I think the real issue is that the OP is probably using a
pre-8.4 release which will always do SELECT DISTINCT via sort-and-unique.
Hash aggregation would be a whole lot faster for these numbers, even
if not exactly instantaneous. He could update to 8.4, or go over to
using GROUP BY as was recommended upthread.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message serafin segador 2009-07-10 01:00:24 Re: pg_dump PostgreSQL 8.4
Previous Message Greg Stark 2009-07-10 00:36:38 Re: SELECT DISTINCT very slow