Quick Links

Re: SELECT DISTINCT very slow

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-general(at)postgresql(dot)org, Ben Harper <rogojin(at)gmail(dot)com>
Subject:	Re: SELECT DISTINCT very slow
Date:	2009-07-10 00:36:38
Message-ID:	407d949e0907091736y5855d537g9259ead0df2bf42f@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Thu, Jul 9, 2009 at 4:47 PM, Andres Freund<andres(at)anarazel(dot)de> wrote:
> AFAIK the primary cause is that indexes in pg do not store visibility
> information.

Not really. The OP doesn't say how wide the record rows are but unless
they're very wide it wouldn't pay to use an index for this even if you
didn't have to access the heap also. It's going to be faster to scan
the whole heap and either sort or use a hash. Currently there aren't
many cases where a btree with 6,000 copies of 111 distinct keys is
going to be useful.

Arguably the missing feature here is skip-scans where we scan the
index but only pull out one record for each distinct value. I'm not
sure there's anything particularly stopping Postgres from being able
to do them, but it might be a lot of code for a narrow use case.

--
greg
http://mit.edu/~gsstark/resume.pdf

In response to

Re: SELECT DISTINCT very slow at 2009-07-09 15:47:38 from Andres Freund

Responses

Re: SELECT DISTINCT very slow at 2009-07-10 00:50:40 from Tom Lane
Re: SELECT DISTINCT very slow at 2009-07-10 17:17:08 from Jeff Davis

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2009-07-10 00:50:40	Re: SELECT DISTINCT very slow
Previous Message	Scott Marlowe	2009-07-10 00:28:16	Re: Database storage