Re: select distinct and index usage

From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Alban Hertroys" <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>, "David Wilson" <david(dot)t(dot)wilson(at)gmail(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: select distinct and index usage
Date: 2008-04-08 02:43:37
Message-ID: F0238EBA67824444BC1CB4700960CB48051100F6@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Alban Hertroys wrote
> Something that might help you, but I'm not sure whether it
> might hurt
> the performance of other queries, is to cluster that table on
> val_datestamp_idx. That way the records are already (mostly) sorted
> on disk in the order of the datestamps, which seems to be the brunt
> of above query plan.

I've a question about this suggestion, in relation to what the cost estimation calculation does, or could possibly do:
If there are 4000 distinct values in the index, found randomly amongst 75 million rows, then you might be able to check the visibility of all those index values through reading a smaller number of disk pages than if the table was clustered by that index.
As an example, say there are 50 rows per page, at a minimum you could be very lucky and determine that they where all visible through reading only 80 data pages. More likely you'd be able to determine that through a few hundred pages. If the table was clustered by an index on that field, you'd have to read 4000 pages.

Is this question completely unrelated to PostgreSQL implementation reality, or something worth considering?

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.
__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-04-08 02:47:22 Re: tsvector_update_trigger throws error "column is not of tsvector type"
Previous Message Manuel Sugawara 2008-04-08 02:37:11 Re: Cannot use a standalone backend to VACUUM in "postgres""