From: | Quan Zongliang <quanzongliang(at)yeah(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics |
Date: | 2023-06-17 00:02:54 |
Message-ID: | 4660403b-8df9-2b44-2214-26b21f35b539@yeah.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2023/6/17 06:46, Tom Lane wrote:
> Quan Zongliang <quanzongliang(at)yeah(dot)net> writes:
>> Perhaps we should discard this (dups cnt > 1) restriction?
>
> That's not going to happen on the basis of one test case that you
> haven't even shown us. The implications of doing it are very unclear.
> In particular, I seem to recall that there are bits of logic that
> depend on the assumption that MCV entries always represent more than
> one row. The nmultiple calculation Tomas referred to may be failing
> because of that, but I'm worried about there being other places.
>
The statistics for the other table look like this:
stadistinct | 6
stanumbers1 | {0.50096667,0.49736667,0.0012}
stavalues1 | {v22,v23,v5}
The value that appears twice in the small table (v1 and v2) does not
appear here. The stadistinct's true value is 18 instead of 6 (three
values in the small table do not appear here).
When calculating the selectivity:
if (nd2 > sslot2->nvalues)
totalsel1 += unmatchfreq1 * otherfreq2 / (nd2 - sslot2->nvalues);
totalsel1 = 0
nd2 = 21
sslot2->nvalues = 2
unmatchfreq1 = 0.99990002016420476
otherfreq2 = 0.82608695328235626
result: totalsel1 = 0.043473913749706022
rows = 0.043473913749706022 * 23 * 2,000,000 = 1999800
> Basically, you're proposing a rather fundamental change in the rules
> by which Postgres has gathered statistics for decades. You need to
> bring some pretty substantial evidence to support that. The burden
> of proof is on you, not on the status quo.
>
> regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Joel Jacobson | 2023-06-17 00:38:23 | Re: Do we want a hashset type? |
Previous Message | Tom Lane | 2023-06-16 22:46:37 | Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics |