Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Quan Zongliang <quanzongliang(at)yeah(dot)net>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics
Date: 2023-06-16 22:46:37
Message-ID: 2463029.1686955597@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Quan Zongliang <quanzongliang(at)yeah(dot)net> writes:
> Perhaps we should discard this (dups cnt > 1) restriction?

That's not going to happen on the basis of one test case that you
haven't even shown us. The implications of doing it are very unclear.
In particular, I seem to recall that there are bits of logic that
depend on the assumption that MCV entries always represent more than
one row. The nmultiple calculation Tomas referred to may be failing
because of that, but I'm worried about there being other places.

Basically, you're proposing a rather fundamental change in the rules
by which Postgres has gathered statistics for decades. You need to
bring some pretty substantial evidence to support that. The burden
of proof is on you, not on the status quo.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Quan Zongliang 2023-06-17 00:02:54 Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics
Previous Message Quan Zongliang 2023-06-16 22:32:58 Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics