From: | Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Ron Mayer <ron(at)intervideo(dot)com>, <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Does "correlation" mislead the optimizer on large |
Date: | 2003-01-24 16:27:03 |
Message-ID: | 20030124081405.W30842-100000@megazone23.bigpanda.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Fri, 24 Jan 2003, Tom Lane wrote:
> Ron Mayer <ron(at)intervideo(dot)com> writes:
> > In particular, if I have a large table t with columns 'a','b','c', etc,
> > and I cluster the table as follows:
> > create table t_ordered as select * from t order by a,b;
> > vacuum analyze t_ordered;
> > Column "b" will (correctly) get a very low "correlation" in
> > the pg_stats table -- but I think the optimizer would do better
> > assuming a high correlation because similar 'b' values are still
> > grouped closely on the same disk pages.
>
> How would that be? They'll be separated by the stride of 'a'.
I think it's a clumping effect.
For example, I made a table (ordered) with 20 values of a, 50 values of b
(each showing up in each a) and 100 values of c (not used, just means 100
rows for each (a,b) combination. It's got 541 pages it looks like. Analyze
sets the correlation to about 0.08 on the table and so a query like:
select * from test1 where b=1; prefers a sequence scan (1791 vs 2231)
while the index scan actually performs about 5 times better.
I guess the reason is that in general, the index scan *really* is reading
something on the order of 40 pages rather than the much larger estimate
(I'd guess something on the order of say 300-400? I'm not sure how to
find that except by trying to reverse engineer the estimate number),
because pretty much each value of a will probably have 1 or 2 pages with
b=1.
I'm not really sure how to measure that, however.
From | Date | Subject | |
---|---|---|---|
Next Message | Ron Johnson | 2003-01-24 18:46:49 | Re: Crash Recovery, pt 2 |
Previous Message | Andrew Sullivan | 2003-01-24 16:13:33 | Re: Having trouble with backups (was: Re: Crash Recovery) |