From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Improvement of var_eq_non_const() |
Date: | 2025-03-21 13:09:21 |
Message-ID: | 3b22d2ac-e084-4a15-8068-ed1eb6938900@tantorlabs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 20.02.2025 21:21, Tom Lane wrote:
> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> I'd like to suggest to improve var_eq_non_const() by using knowledge of MCV and
>> estimate the selectivity as quadratic mean of non-null fraction divided by
>> number of distinct values (as it was before) and set of MCV selectivities.
> What's the statistical interpretation of this calculation (that is,
> the average MCV selectivity)? Maybe it's better, but without any
> context it seems like a pretty random thing to do. In particular,
> it seems like this could give radically different answers depending
> on how many MCVs we chose to store, and I'm not sure we could argue
> that the result gets more accurate with more MCVs stored.
>
> regards, tom lane
>
>
Hi,
The arithmetic mean is not exactly the same as the root mean square
approach implemented by Teodor. The key difference is that the root mean
square is more influenced by the largest values in the distribution. The
further the data deviates from a uniform distribution, the less
representative a simple arithmetic mean becomes.
Theodor's idea seems quite useful to me because it ensures that
selectivity is now influenced by multiple significant values from the
MCV list, rather than just the single most frequent one. This should
lead to a more accurate selectivity estimate, better reflecting the
actual data distribution.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From | Date | Subject | |
---|---|---|---|
Next Message | Yura Sokolov | 2025-03-21 13:24:16 | Re: Network failure may prevent promotion |
Previous Message | Ashutosh Bapat | 2025-03-21 13:09:11 | Re: Test to dump and restore objects left behind by regression |