From: | "Michael G(dot) Martin" <michael(at)vpmonline(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Ron Mayer <ron(at)intervideo(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Inconsistant use of index. |
Date: | 2002-03-26 18:10:10 |
Message-ID: | 3CA0B982.2050503@vpmonline.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
I had an issue where my index was not always used on a very large table.
The issue came down to the data distribution and not pulling in enough
of a random sample to get an accurate estimate ( I think the default max
value was around 3000 sample rows ( 300 * 10 default_samples -- see
analyze.c ) rows. I fixed the issue by following Tom's advice and
increased the statistics count on my table to pull in 300000 rows (1000
samples *300). I had to play with the value, re-analyze, and check the
stats in the pg_stats table until most_common_freqs on some values were
all fairily close. The explain plan still shows me a cost and row value
way above what is physically in the table, but at least my indexes were
being used.
alter table table_name alter symbol_name set statistics 1000;
--Michael
Tom Lane wrote:
>Ron Mayer <ron(at)intervideo(dot)com> writes:
>
>> Once some of my tables started getting pretty large, PostgreSQL
>>suddenly stopped using indexes when I use expressions like "col = value"
>>decreasing performance by 20X.
>>
>
>Hmm. The EXPLAIN shows that the planner is not doing too badly at
>estimating the number of rows involved:
>
>>logs2=# explain analyze select count(*) from fact where dat='2002-03-01';
>>NOTICE: QUERY PLAN:
>>
>
>>Aggregate (cost=375631.14..375631.14 rows=1 width=0) (actual
>>time=76689.42..76689.42 rows=1 loops=1)
>> -> Seq Scan on fact (cost=0.00..375101.72 rows=211765 width=0) (actual
>>time=20330.96..76391.94 rows=180295 loops=1)
>>Total runtime: 76707.92 msec
>>
>
>212K estimate for 180K real is not bad at all. So the problem is in the
>cost models not the initial row count estimation.
>
>If you force an indexscan via "set enable_seqscan to off", what does
>EXPLAIN ANALYZE report?
>
>Also, what do you get from
> select * from pg_stats where tablename = 'fact';
>I'm particularly interested in the correlation estimate for the dat
>column. (Would you happen to have an idea whether the data has been
>inserted more-or-less in dat order?)
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>message can get through to the mailing list cleanly
>
From | Date | Subject | |
---|---|---|---|
Next Message | Ron Mayer | 2002-03-26 18:11:43 | Re: Inconsistant use of index. |
Previous Message | Michael Beckstette | 2002-03-26 18:03:50 | Re: postmaster crash |