From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Greg Stark <stark(at)mit(dot)edu> |
Cc: | "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Strange heuristic in analyze.c |
Date: | 2010-02-05 20:53:57 |
Message-ID: | 201002052053.o15KrvG09347@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greg Stark wrote:
> So I never realized the consequences of this little heuristic in
> analyze.c in the handling of very low cardinality columns where we
> want to just capture the complete list of values in the mcv and throw
> away the histogram:
>
> else if (toowide_cnt == 0 && nmultiple == ndistinct)
> {
> /*
> * Every value in the sample appeared more than once. Assume the
> * column has just these values.
> */
> stats->stadistinct = ndistinct;
> }
>
> The problem with this heuristic is that if the table is small enough
> you might expect you can set the statistics target high and "sample"
> the entire table and get a very accurate mcv covering all the values.
> However if any of the values in the table appears only once this
> heuristic will defeat you. The following code will then throw out of
> the mcv any value which isn't 25% more common than "average". Leaving
> you with a histogram for those values which often does very poorly if
> the values don't fit any pattern and are just discrete arbitrary
> values.
Do you want a C comment to document this problem?
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2010-02-05 21:01:52 | Re: Confusion over Python drivers |
Previous Message | Greg Smith | 2010-02-05 20:51:06 | Re: Confusion over Python drivers |