From: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Query Optimizer makes a poor choice |
Date: | 2011-11-30 23:41:15 |
Message-ID: | 4ED6BF1B.2080604@fuzzy.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 30.11.2011 23:22, Tyler Hains wrote:
>>> I haven't had a chance to experiment with the SET STATISTICS, but
> that
>>> got me going on something interesting...
>>>
>>> Do these statistics look right?
>>>
>>> # SELECT attname, n_distinct, most_common_vals, histogram_bounds FROM
>>> pg_stats WHERE tablename = 'cards';
>>>
>> ...
>>> "card_set_id" 905
>>> "{5201,3203,3169,5679,5143,5204,5655,4322,5236,4513}"
>>> "{4,3080,3896,4349,4701,5179,5445,5706,6003,6361,6784}"
>>
>> This looks promising, because n_distinct is low enough that you can
>> cover almost all values with statistics.
>> raise the statistics and ANALYZE. should help.
>> (NOTE NOTE NOTE: assuming that the distribution is even)
>>
>>
>> ...
>> but one thing we see for sure is that you have not tuned your
>> PostgreSQL instance :-)
>> I would recommend pgtune, -> pgfoundry.org/projects/pgtune/
>> it covers most important stuff, *including* default_statistics_target.
>>
>>
>>
>> Filip
>>
>
> I just tried the set statistics on our test system with essentially the
> same end result.
Can you describe the problem in a bit more detail? Because maybe you
just have the same problem as the OP.
Because with this (very simple) test case it works just fine.
========================================================================
create table test_tab (id int primary key, val int, txtval text);
insert into test_tab select i, mod(i, 10000), md5(i::text) from
generate_series(1,10000000) s(i);
create index test_tab_idx on test_tab (val);
analyze test_tab;
========================================================================
The table is about 730MB, the indexes are about 214MB each.
========================================================================
explain analyze select * from test_tab where val = 500 order by id;
1st execution (not cached): http://explain.depesz.com/s/1VQ (7786 ms)
2nd execution (cached): http://explain.depesz.com/s/cnt (1 ms)
explain analyze select * from test_tab where val = 500 order by id limit 1;
1st execution (not cached): http://explain.depesz.com/s/nlE (66 ms)
2nd execution (cached): http://explain.depesz.com/s/WNa (0.08 ms)
========================================================================
So in both cases the LIMIT (with index scan) is faster. Sure, there may
be cases when this does not work that well - maybe it's not well cached,
maybe there's some other issue.
But it clearly is not true that LIMIT is evil and should be avoided.
Tomas
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2011-11-30 23:44:45 | Re: Is it possible to make a streaming replication faster using COPY instead of lots of INSERTS? |
Previous Message | Scott Mead | 2011-11-30 23:04:29 | Re: Extending the volume size of the data directory volume |