From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | Nathan Boley <npboley(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Collect frequency statistics for arrays |
Date: | 2012-01-17 10:33:22 |
Message-ID: | 20120117103322.GA19977@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 17, 2012 at 12:04:06PM +0400, Alexander Korotkov wrote:
> Thanks for your fixes to the patch. Them looks correct to me. I did some
> fixes in the patch. The proof of some concepts is still needed. I'm going
> to provide it in a few days.
Your further fixes look good. Could you also answer my question about the
header comment of mcelem_array_contained_selec()?
/*
* Estimate selectivity of "column <@ const" based on most common element
* statistics. Independent element occurrence would imply a particular
* distribution of distinct element counts among matching rows. Real data
* usually falsifies that assumption. For example, in a set of 1-element
* integer arrays having elements in the range [0;10], element occurrences are
* not independent. If they were, a sufficiently-large set would include all
* distinct element counts 0 through 11. We correct for this using the
* histogram of distinct element counts.
*
* In the "column @> const" and "column && const" cases, we usually have
* "const" with low summary frequency of elements (otherwise we have
* selectivity close to 0 or 1 correspondingly). That's why the effect of
* dependence related to distinct element counts distribution is negligible
* there. In the "column <@ const" case, summary frequency of elements is
* high (otherwise we have selectivity close to 0). That's why we should do
* correction due to array distinct element counts distribution.
*/
By "summary frequency of elements", do you mean literally P_0 + P_1 ... + P_N?
If so, I can follow the above argument for "column && const" and "column <@
const", but not for "column @> const". For "column @> const", selectivity
cannot exceed the smallest frequency among const elements. A number of
high-frequency elements will drive up the sum of the frequencies without
changing the true selectivity much at all.
Thanks,
nm
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2012-01-17 10:38:23 | Re: Online base backup from the hot-standby |
Previous Message | Simon Riggs | 2012-01-17 10:31:47 | Re: WAL Restore process during recovery |