Re: GIN indexscans versus equality selectivity estimation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GIN indexscans versus equality selectivity estimation
Date: 2011-01-11 01:17:50
Message-ID: AANLkTimeZkKm=_Do-bv5okESiF+Vcjz-HWCAHGe2PaQa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 10, 2011 at 10:25 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> or we could hack eqsel() to bound the no-stats estimate to a bit less
>>> than 1.
>
>> This seems like a pretty sensible thing to do.  I can't immediately
>> imagine a situation in which 1.0 is a sensible selectivity estimate in
>> the no-stats case and 0.90 (say) is a major regression.
>
> After sleeping on it, that seems like my least favorite option.  It's
> basically a kluge, as is obvious because there's no principled way to
> choose what the bound is (or the minimum result from
> get_variable_numdistinct, if we were to hack it there).

Well, the general problem is that we have no reasonable way of
handling planning uncertainty. We have no way of throwing our hands
up in the air and saying "I really have no clue how many rows are
going to come out of that node"; as far as the rest of the planning
process is concerned, a selectivity estimate of 0.005 based on
<column> = <some MCV with a frequency of 0.005> is exactly identical
to one that results from a completely inscrutable equality condition.
So while I agree with you that there's no particular principled way to
choose the exact value, that doesn't strike me as a compelling
argument against fixing some value. ISTM that selectivity estimates
of exactly 0 and exactly 1 ought to be viewed with a healthy dose of
suspicion.

> I'm currently
> leaning to the idea of tweaking the logic in indxpath.c; in particular,
> why wouldn't it be a good idea to force consideration of the bitmap path
> if the index type hasn't got amgettuple?  If we don't, then we've
> completely wasted the effort spent up to that point inside
> find_usable_indexes.

I guess the obvious question is: why wouldn't it be a good idea to
force consideration of the bitmap path even if the index type DOES
have amgettuple?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-01-11 01:24:07 Re: system views for walsender activity
Previous Message Kevin Grittner 2011-01-11 01:16:04 Re: Compatibility GUC for serializable