On 7/8/24 19:45, Andrei Lepikhov wrote:
> On 3/11/2023 23:43, Tomas Vondra wrote:
>> On 9/11/23 10:04, Lepikhov Andrei wrote:
>> * Determine bucketsize fraction and MCV frequency for the inner
>> * relation. We use the smallest bucketsize or MCV frequency estimated
>> * for any individual hashclause; this is undoubtedly conservative.
>>
>> I'm sure this may lead to inflated cost for "good" cases (where the
>> actual bucket size really is a product), which may push the optimizer to
>> use the less efficient/slower join method.
> Yes, It was contradictory idea, though.
>> IMHO the only principled way forward is to get a better ndistinct
>> estimate (which this implicitly does), perhaps by using extended
>> statistics. I haven't tried, but I guess it'd need to extract the
>> clauses for the inner side, and call estimate_num_groups() on it.
> And I've done it. Sorry for so long response. This patch employs of
> extended statistics for estimation of the HashJoin bucket_size. In
> addition, I describe the idea in more convenient form here [1].
> Obviously, it needs the only ndistinct to make a prediction that allows
> to reduce computational cost of this statistic.
Minor change to make cfbot happier.
--
regards, Andrei Lepikhov