Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

From: "Nathan Boley" <npboley(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Zeugswetter Andreas OSB sIT" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Date: 2008-06-10 21:31:01
Message-ID: 6fa3b6e20806101431u1e4395d7ob5d925d59eb45b20@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> Why? What if ( somehow ) we knew that there was a 90% chance that
>> query would return an empty result set on a big table with 20 non-mcv
>> distinct values. Currently the planner would always choose a seq scan,
>> where an index scan might be better.
>
> (1) On what grounds do you assert the above?

For a table with 1000000 non-mcv rows, the planner estimates a result
set of cardinality 1000000/20 = 50000, not 1.

> (2) What makes you think that an estimate of zero rather than one row
> would change the plan?

I see where the confusion is coming from. When I said

>> What if ( somehow ) we knew that there was a 90%
>> chance that query would return an empty result set

I meant that the planner doesn't know that information. And how could it?

The estimate for ndistinct is an estimate for the number of distinct
values in the table, not an estimate for the number of distinct values
that will be queried for. My original point was that we sometimes
care about the distribution of what's being queried for and not just
what's in the table.

But this is all silly anyways: if this was really a concern you would
write a function

if values exist
return values
else return none

> (In fact, I don't think the plan would change, in this case. The reason
> for the clamp to 1 row is to avoid foolish results for join situations.)

Which makes sense. My point certainly wasn't, in any way, a criticism
of clamping selectivity to 1.

Cheers,
Nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-06-10 23:03:43 Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Previous Message Teodor Sigaev 2008-06-10 21:30:16 Re: Proposal: GiST constraints