From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | "Loftis, Charles E" <charles(dot)loftis(at)eds(dot)com>, pgsql-novice(at)postgresql(dot)org |
Subject: | Re: SQL - Indexing for performance on uniquness check... |
Date: | 2004-07-19 04:05:48 |
Message-ID: | 8589.1090209948@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Charles,
>> Sample query to return non-uniqueness
>> SELECT A1, A2, A3, ..., An
>> FROM Table
>> GROUP BY A1, A2, A3, ..., An
>> HAVING Count(*)>1
> In order for it to be even possible to use an index (a hashaggregate
> operation, actually) on this table, you'd have to include *all* of the GROUP
> BY columns in a single, multi-column index.
> However, it would be unlikely for PG to use any kind of an index in the
> operation above, because of the number of columns, the unlikelyness of
> grouping (i.e. there will only be a minority of rows with count(*) > 1) and
> the fact that you're running this against the whole table. So any
> kind of an index is liable to be useless.
Yeah. If you are not expecting a huge number of groups, I think that it
would be more interesting to try a HashAggregate plan than a sort/group
plan. For this you need 7.4 or later and a sort_mem setting large
enough to cover whatever the planner estimates the hashtable size to be.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2004-07-19 04:23:21 | Re: FOR-IN-EXECUTE, why fail? |
Previous Message | Scott Marlowe | 2004-07-18 22:39:12 | Re: Calling Functions and Stored Procedures |