From: | William Garrison <postgres(at)mobydisk(dot)com> |
---|---|
To: | Adrian Klaver <aklaver(at)comcast(dot)net> |
Cc: | Postgres General List <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Simple query not using index: why? |
Date: | 2008-09-03 19:55:17 |
Message-ID: | 48BEEBA5.3030108@mobydisk.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Can't it just scan the index to get that? I assumed the index had links
to every fileid in the table. In my over-simplified imagination, the
table looks like this:
ctid|fileid|column|column|column|column
ctid|fileid|column|column|column|column
ctid|fileid|column|column|column|column
ctid|fileid|column|column|column|column
etc.
While the index looks like
fileid|ctid
fileid|ctid
fileid|ctid
fileid|ctid
...
So I expected scanning the index was faster, and still had everything it
needed to do the count. Or perhaps it was because I said COUNT(*) so it
needs to look at the other columns in the table? I really just wanted
the number of "hits" not the number of records with distinct values or
anything like that. My understanding was that COUNT(*) did that, and
didn't really look at the columns themselves.
Adrian Klaver wrote:
> -------------- Original message ----------------------
> From: William Garrison <postgres(at)mobydisk(dot)com>
>
>> I am looking for records with duplicate keys, so I am running this query:
>>
>> SELECT
>> fileid, COUNT(*)
>> FROM
>> file
>> GROUP BY
>> fileid
>> HAVING
>> COUNT(*)>1
>>
>> The table has an index on fileid (non-unique index) so I am surprised
>> that postgres is doing a table scan. This database is >15GB, and there
>> are a number of fairly large string columns in the table. I am very
>> surprised that scanning the index is not faster than scanning the
>> table. Any thoughts on that? Is scanning the table faster than
>> scanning the index? Is there a reason that it needs anything other than
>> the index?
>>
>>
>
> I may be missing something, but it would have to scan the entire table to get all the occurrences of each fileid in order to do the count(*).
>
>
>
> --
> Adrian Klaver
> aklaver(at)comcast(dot)net
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Joshua Drake | 2008-09-03 20:01:30 | Re: Simple query not using index: why? |
Previous Message | Tony Caduto | 2008-09-03 19:54:52 | Re: SELECT INTO returns incorrect values |