From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: select distinct uses index scan vs full table scan |
Date: | 2011-12-13 19:57:57 |
Message-ID: | 9562.1323806277@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> writes:
> I've got a 5GB table with about 12 million rows.
> Recently, I had to select the distinct values from just one column.
> The planner chose an index scan. The query took almost an hour.
> When I forced index scan off, the query took 90 seconds (full table scan).
Usually, we hear complaints about the opposite. Are you using
nondefault cost settings?
> The planner estimated 70,000 unique values when, in fact, there are 12
> million (the value for this row is *almost* but not quite unique).
> What's more, despite bumping the statistics on that column up to 1000
> and re-analyzing, the planner now thinks that there are 300,000 unique
> values.
Accurate ndistinct estimates are hard, but that wouldn't have much of
anything to do with this particular choice, AFAICS.
> How can I tell the planner that a given column is much more unique
> than, apparently, it thinks it is?
9.0 and up have ALTER TABLE ... ALTER COLUMN ... SET n_distinct.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jon Nelson | 2011-12-13 20:17:58 | Re: select distinct uses index scan vs full table scan |
Previous Message | Jon Nelson | 2011-12-13 18:12:47 | select distinct uses index scan vs full table scan |