From: | Maxim Boguk <mboguk(at)masterhost(dot)ru> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Postgresql selecting strange index for simple query |
Date: | 2009-02-26 21:12:19 |
Message-ID: | 49A705B3.9080408@masterhost.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Tom Lane wrote:
> Maxim Boguk <mboguk(at)masterhost(dot)ru> writes:
>> So i have two theory (just waving hands ofcourse):
>> 1)integer owerflow somewhere in cost calculation
>
> Costs are floats, and in any case you're not showing costs anywhere near
> the integer overflow limit...
>
>> 2)floating rounding errors (because cost very close in wrong situations: 254918.19 vs 259709.09)
>
> The planner is intentionally set up to consider costs within a percent
> or so of each other as being effectively equal. If the estimated costs
> are that close then it doesn't surprise me if it sometimes picks the
> "wrong" plan. The real question is why are the estimates so close?
> They should not be, since AFAICS you are talking about a situation
> where we'd have to scan all of the multicol index versus only about
> a fifth of the single-col one.
Ok i exploring more:
just one thing:
hh=# SHOW default_statistics_target ;
default_statistics_target
---------------------------
10
(1 row)
(btw increase statistic to 1000 do not fix situation).
I try simplify test case and:
Now use sequential user_id, and truncate last_change_time to date:
SELECT nextval('test_seq') as user_id,last_change_time::date ,rpad('a',500,'b') as f1 into test_table from resume;
ANALYZE test_table;
SELECT count(*) from test_table;
CREATE INDEX right_idx on test_table(last_change_time);
CREATE INDEX wrong_idx on test_table(user_id, last_change_time);
EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time > '2009-01-10 00:00:00';
Index Scan using wrong_idx on test_table (cost=0.00..182623.51 rows=316522 width=0) (actual time=0.056..534.620 rows=382671 loops=1)
Index Cond: (last_change_time > '2009-01-10'::date)
DROP INDEX wrong_idx;
EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time > '2009-01-10 00:00:00';
Index Scan using right_idx on test_table (cost=0.00..221765.19 rows=316522 width=0) (actual time=0.023..346.213 rows=382671 loops=1)
Index Cond: (last_change_time > '2009-01-10'::date)
Full index scan over wrong index cost reasonable lower then 1/17 of single column index (182623 vs 221765)!
So just last_change_time still cannot be generated... but:
hh=# SELECT count(distinct last_change_time) from test_table;
count
-------
2133
(1 row)
And statistic values for last_change_time is:
hh=# SELECT * from pg_stats where tablename='test_table' and attname='last_change_time';
-[ RECORD 1 ]-----+---------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | test_table
attname | last_change_time
null_frac | 0
avg_width | 4
n_distinct | 1211
most_common_vals | {2004-02-27,2009-01-26,2008-03-11,2009-01-27,2004-01-15,2008-10-13,2009-01-19,2009-01-22,2009-01-21,2008-09-29}
most_common_freqs | {0.00833333,0.00766667,0.00733333,0.007,0.00666667,0.00533333,0.00533333,0.00533333,0.005,0.00466667}
histogram_bounds | {2003-03-31,2005-08-26,2006-10-04,2007-04-29,2007-09-27,2008-01-24,2008-05-01,2008-07-29,2008-10-01,2008-11-27,2009-01-25}
correlation | 0.261512
I think it is all what planner can use when choose plan... because user_id is unique sequential values.
regargs, Maxim Boguk
From | Date | Subject | |
---|---|---|---|
Next Message | Tim Uckun | 2009-02-26 21:19:10 | Re: speaking of 8.4... |
Previous Message | Adrian Klaver | 2009-02-26 21:03:24 | Re: Connection refused (0x0000274D/10061). |