From: | Ron Mayer <ron(at)intervideo(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | <pgsql-bugs(at)postgresql(dot)org>, Ron Mayer <ron(at)intervideo(dot)com> |
Subject: | Re: Inconsistant use of index. |
Date: | 2002-04-03 19:14:27 |
Message-ID: | Pine.LNX.4.33.0204031108250.24107-100000@ron |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Tue, 26 Mar 2002, Tom Lane wrote:
> Ron Mayer <ron(at)intervideo(dot)com> writes:
> >> I'm particularly interested in the correlation estimate for the dat
> >> column. (Would you happen to have an idea whether the data has been
> >> inserted more-or-less in dat order?)
>
> > I beleve much of February was loaded first, then we back-filled January,
> > and daily I've been adding March's results. I don't believe the index-usage
> > stopped when we did the january fill... something happend a few days ago after
> > a pretty routine daily load.
>
> The correlation estimate for dat is pretty low (0.086088), which I think
> reflects the fact that on a macro level your data is not very orderly
> (Feb/Jan/Mar). However, if it's been loaded on a daily basis then the
> records for any single day will be together --- which is why the indexed
> probe for a single day is so fast. I don't see any way that we can
> expect the system to model this effect with only one ordering-correlation
> number :-( ... so a proper fix will have to wait for some future release
> when we can think about having more extensive stats about ordering.
>
> In the meantime, it would be interesting to see if re-establishing the
> big-picture order correlation would persuade the planner to do the right
> thing. Could you do something like this:
>
> CREATE TABLE foo AS SELECT * FROM fact ORDER BY dat;
> TRUNCATE TABLE fact;
> INSERT INTO fact SELECT * FROM foo;
> DROP TABLE foo;
> VACUUM ANALYZE fact;
>
> (this should leave you in a state where pg_stats shows correlation 1.0
> for fact.dat) and then see what you get from EXPLAIN?
>
> regards, tom lane
I did quite a bit more playing with this, and no matter what the
correlation was (1, -0.001), it never seemed to have any effect
at all on the execution plan.
Should it? With a high correlation the index scan is a much better choice.
Ron
---
--- create the table with a correlation of "1".
---
logs2=# CREATE TABLE fact_by_dat AS SELECT * FROM fact ORDER BY dat;
SELECT
logs2=# CREATE INDEX fact_by_dat__dat ON fact_by_dat(dat);
CREATE
logs2=# vacuum analyze fact_by_dat;
VACUUM
logs2=# select correlation from pg_stats where tablename='fact_by_dat' and attname='dat';
correlation
-------------
1
(1 row)
---
--- Still does the "Seq Scan"
---
logs2=# explain analyze select count(*) from fact_by_dat where dat='2002-03-01';
NOTICE: QUERY PLAN:
Aggregate (cost=380347.31..380347.31 rows=1 width=0) (actual time=77785.14..77785.14 rows=1 loops=1)
-> Seq Scan on fact (cost=0.00..379816.25 rows=212423 width=0) (actual time=20486.16..77420.05 rows=180295 loops=1)
Total runtime: 77785.28 msec
EXPLAIN
---
--- Disable Seq Scan... 30 times faster.
---
logs2=# set enable_seqscan to off;
SET VARIABLE
logs2=# explain analyze select count(*) from fact_by_dat where dat='2002-03-01';
NOTICE: QUERY PLAN:
Aggregate (cost=5502.57..5502.57 rows=1 width=0) (actual time=2671.20..2671.20 rows=1 loops=1)
-> Index Scan using fact_by_dat__dat on fact_by_dat (cost=0.00..4974.99 rows=211036 width=0)
(actual time=90.24..2339.64 rows=180295 loops=1)
Total runtime: 2693.87 msec
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-04-03 20:22:50 | Re: Inconsistant use of index. |
Previous Message | pgsql-bugs | 2002-04-03 17:47:27 | Bug #628: move works incorrectly on cursors using GiST indexes |