From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Kristofer Munn <kmunn(at)munn(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Index Puzzle for you |
Date: | 1999-12-29 10:12:55 |
Message-ID: | 199912291012.FAA24890@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Tom Lane wrote:
> > The thing that jumps out at me from this example is the much larger
> > estimate of returned rows in the second case. The planner is clearly
>
> Good catch! There were 296 possible issues the table. One had 86,544
> articles associated with it. The next highest was 5,949. Then the
> numbers drop to 630, 506, 412, 184 and then the rest are all under 62.
> Out of curiosity, how does vacuum decide on the large estimate?
>
> The maximum is 86,544.
> The average row return for ixissue = x is 3412.
> The median is 25.
> The mode is 25.
>
> ixissue is the result of a sequence.
>
> Thanks for the heads up on this...
Here is the relevent comment from vacuum.c. It is not perfect, but was
the best thing I could think of.
---------------------------------------------------------------------------
/*
* vc_attrstats() -- compute column statistics used by the optimzer
*
* We compute the column min, max, null and non-null counts.
* Plus we attempt to find the count of the value that occurs most
* frequently in each column. These figures are used to compute
* the selectivity of the column.
*
* We use a three-bucked cache to get the most frequent item.
* The 'guess' buckets count hits. A cache miss causes guess1
* to get the most hit 'guess' item in the most recent cycle, and
* the new item goes into guess2. Whenever the total count of hits
* of a 'guess' entry is larger than 'best', 'guess' becomes 'best'.
*
* This method works perfectly for columns with unique values, and columns
* with only two unique values, plus nulls.
*
* It becomes less perfect as the number of unique values increases and
* their distribution in the table becomes more random.
*
*/
--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
From | Date | Subject | |
---|---|---|---|
Next Message | Adriaan Joubert | 1999-12-29 13:09:26 | Re: [HACKERS] Index corruption |
Previous Message | Margarit Nickolov | 1999-12-29 08:52:49 | Index scan on CIDR field ? |