On Thu, Dec 12, 2013 at 3:56 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
> Estimated grouping should, however, affect MCVs. In cases where we
> estimate that grouping levels are high, the expected % of observed
> values should be "discounted" somehow. That is, with total random
> distribution you have a 1:1 ratio between observed frequency of a value
> and assumed frequency. However, with highly grouped values, you might
> have a 2:1 ratio.
Cross validation can help there. But it's costly.