Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Nathan Boley" <npboley(at)gmail(dot)com>
Cc: "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Zeugswetter Andreas OSB sIT" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Date: 2008-06-10 20:33:17
Message-ID: 24109.1213129997@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Nathan Boley" <npboley(at)gmail(dot)com> writes:
>>> If we query on values that aren't in the table, the planner will
>>> always overestimate the expected number of returned rows because it (
>>> implicitly ) assumes that every query will return at least 1 record.
>>
>> That's intentional and should not be changed.

> Why? What if ( somehow ) we knew that there was a 90% chance that
> query would return an empty result set on a big table with 20 non-mcv
> distinct values. Currently the planner would always choose a seq scan,
> where an index scan might be better.

(1) On what grounds do you assert the above?

(2) What makes you think that an estimate of zero rather than one row
would change the plan?

(In fact, I don't think the plan would change, in this case. The reason
for the clamp to 1 row is to avoid foolish results for join situations.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2008-06-10 20:52:19 Re: Automating our version-stamping a bit better
Previous Message Nathan Boley 2008-06-10 19:16:11 Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics