Quick Links

Re: select distinct and index usage

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "David Wilson" <david(dot)t(dot)wilson(at)gmail(dot)com>, "Alban Hertroys" <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>, <pgsql-general(at)postgresql(dot)org>
Subject:	Re: select distinct and index usage
Date:	2008-04-08 11:37:29
Message-ID:	878wzo60hy.fsf@oxford.xeocode.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Tom Lane escribió:
>>> What I think you'll find, though, is that once you do force an indexscan
>>> to be picked it'll be slower. Full-table index scans are typically
>>> worse than seqscan+sort, unintuitive though that may sound.

The original poster's implicit expectation is that an index scan would be
faster because it shouldn't have to visit every tuple. Once it's found a tuple
with a particular value it should be able to use the index to skip to the next
key value.

I thought our DISTINCT index scan does do that but it still has to read the
index leaf pages sequentially. It doesn't back-track up the tree structure and
refind the next key.

>> Hmm, should we switch the CLUSTER code to do that?
>
> It's been suggested before, but I'm not sure. The case where an
> indexscan can win is where the table is roughly in index order already.
> So if you think about periodic CLUSTER to maintain table ordering,
> I suspect you'd want the indexscan implementation for all but maybe
> the first time.

I think we would push a query through the planner to choose the best plan
based on the statistics. I'm not sure how this would play with the visibility
rules -- iirc not all scan types can be used with all visibility modes. And
also I'm not sure how Heikki's MVCC-safe cluster would work if it's not sure
what order it's scanning the heap.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

In response to

Re: select distinct and index usage at 2008-04-08 02:30:31 from Tom Lane

Responses

Re: select distinct and index usage at 2008-04-08 12:48:40 from Alvaro Herrera

Browse pgsql-general by date

	From	Date	Subject
Next Message	Sim Zacks	2008-04-08 11:39:32	dirty select
Previous Message	Mikko Partio	2008-04-08 11:05:06	Re: "too many trigger records found for relation xyz"