Re: Correlation in cost_index()

From: "Zeugswetter Andreas SB SD" <ZeugswetterA(at)spardat(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Sean Chittenden" <sean(at)chittenden(dot)org>
Cc: "Manfred Koizar" <mkoi-pg(at)aon(dot)at>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Correlation in cost_index()
Date: 2003-08-11 12:58:10
Message-ID: 46C15C39FEB2C44BA555E356FBCD6FA4961FEC@m0114.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> In both cases ANALYZE will calculate correlation 1.0 for column X,
> and something near zero for column Y. We would like to come out with
> index correlation 1.0 for the left-hand case and something much less
> (but, perhaps, not zero) for the right-hand case. I don't really see
> a way to do this without actually examining the multi-column ordering
> relationship during ANALYZE.

The only way the second column correlation will be irrelevant is if
the first column is already (nearly) unique (enough so, that the second
column wont scatter fetches enough to fill the buffer before seeing cache hits).
Thus I think when merging correlations you could take nunique into account.

corr = corr_1 * (corr_2 * ( 1 - nunique_1 / nrows))

But, I think one (new) correlation metric for the whole index (whole key) and the
data pages would actually be sufficient. This metric could imho always be used
instead of the per column correlations to calculate index cost. This holds true
as long as you walk an index range, and that is what it is all about, no ?

???
Andreas

Browse pgsql-hackers by date

  From Date Subject
Next Message The Hermit Hacker 2003-08-11 13:03:12 Re: Farewell
Previous Message Hiroshi Inoue 2003-08-11 10:07:44 Re: Farewell