From: | "David Wilson" <david(dot)t(dot)wilson(at)gmail(dot)com> |
---|---|
To: | "Alban Hertroys" <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: select distinct and index usage |
Date: | 2008-04-07 18:27:42 |
Message-ID: | e7f9235d0804071127n5cd08096qec8653c905550f1e@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, Apr 7, 2008 at 1:42 PM, Alban Hertroys
<dalroi(at)solfertje(dot)student(dot)utwente(dot)nl> wrote:
>
> Have you tried this query with enable_seqscan=off? If my guess is right
> (and the planners, in that case) it'd be even slower.
set enable_seqscan=off;
explain select distinct datestamp from vals;
QUERY PLAN
----------------------------------------------------------------------------------------------
Unique (cost=115003047.47..115380004.83 rows=4263 width=4)
-> Sort (cost=115003047.47..115191526.15 rows=75391472 width=4)
Sort Key: datestamp
-> Seq Scan on vals (cost=100000000.00..101531261.72
rows=75391472 width=4)
It appears to be doing a sequential scan regardless of the set, as if
it doesn't believe it can use the index for some reason
>
> Something that might help you, but I'm not sure whether it might hurt the
> performance of other queries, is to cluster that table on val_datestamp_idx.
> That way the records are already (mostly) sorted on disk in the order of the
> datestamps, which seems to be the brunt of above query plan.
That's a good thought. I'll give that a try this evening when the DB
has some downtime and see what happens.
>
> There seems to be quite a bit of overlap in your index definitions. From my
> experience this can confuse the planner.
>
> I suggest you combine them, but not knowing your data... Maybe rewriting
> your UNIQUE constraint to (val_dur, datestamp, eid, sid) would be enough to
> replace all those other indexes.
> If not, it's probably better to have one index per column, so that the
> planner is free to combine them as it sees fit. That'd result in a bitmap
> index scan, btw.
I can take a look at the other indices again, but those are all in
place for specific other queries that generally involve some set of
a=1, b=2, c=3, datestamp>5 type of where-clause and were created
specifically in response to sequential scans showing up in other
queries (and had the proper effect of fixing them!)
>
> I'm not a postgresql tuning expert (I know my way around though), other
> people can explain you way better than I can. Bruce Momjian for example:
> http://www.linuxjournal.com/article/4791
I'll take a look at that, thanks.
> That calculation doesn't look familiar to me, I'm more used to:
> select pg_size_pretty(pg_relation_size('...'));
>
> You can put the name of any relation in there, be it tables, indexes, etc.
>
> 11GB is pretty large for an index on an integer column, especially with
> only 75M rows: that's 146 bytes/row in your index. Maybe your index got
> bloated somehow? I think it should be about a tenth of that.
pg_total_relation_size('..') gives the number of bytes for the table +
all associated indices; pg_relation_size('..') gives for just the
table. The difference between the two should be total bytes take up by
the 5 total indices (11 total index cols), giving a
back-of-the-envelope estimation of 1gb for the size of the datestamp
index. I am fairly certain that I didn't give pg 1gb to fit the index
in memory, so I'll try upping its total available memory tonight and
see if that doesn't improve things.
I appreciate the responses so far! I'm used to several minutes for
some of the complex queries on this DB, but 12.5 minutes for a select
distinct just seems wrong. :)
--
- David T. Wilson
Princeton Satellite Systems
david(dot)t(dot)wilson(at)gmail(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2008-04-07 18:31:13 | Re: Subtracting Two Intervals |
Previous Message | Terry Lee Tucker | 2008-04-07 18:11:26 | Subtracting Two Intervals |