Re: BETWEEN optimizer problems with single-value

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org, Andreas Kretschmer <akretschmer(at)spamfence(dot)net>
Subject: Re: BETWEEN optimizer problems with single-value
Date: 2006-03-16 11:53:52
Message-ID: 1142510032.3859.320.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Wed, 2006-03-15 at 21:05 -0500, Tom Lane wrote:
> So we need a more accurate estimate for the boundary case.

Agreed.

> > Using 1.0e-10 isn't very useful... the selectivity for a range should
> > never be less than the selectivity for an equality, so we should simply
> > put in a test against one of the pseudo constants and use that as the
> > minimal value.
>
> That's easier said than done, because you'd first have to find the
> appropriate equality operator to use (ie, one having semantics that
> agree with the inequality operators).
...

Kevin: this is also the reason we can't simply transform the WHERE
clause into a more appropriate form...

> Possibly we could drop this code's reliance on seeing
> SCALARLTSEL/SCALARGTSEL as the estimators, and instead try to locate a
> common btree opclass for the operators --- which would then let us
> identify the right equality operator to use, and also let us distinguish
> > from >= etc. If we're trying to get the boundary cases right I
> suspect we have to account for that. I could see such an approach being
> tremendously slow though :-(, because we'd go looking for btree
> opclasses even for operators that have nothing to do with < or >.

Trying to get the information in the wrong place would be very
expensive, I agree. But preparing that information when we have access
to it and passing it through the plan would be much cheaper. Relating
op->opclass will be very useful in other places in planning, even if any
one case seems not to justify the work to record it. (This case feels
like deja vu, all over again.)

The operator and the opclass are only connected via an index access
method, but for a particular index each column has only one opclass. So
the opclass will have a 1-1 correspondence with the operator for *that*
plan only, realising that other plans might have different
correspondences. find_usable_indexes() or thereabouts could annotate a
restriction OpExpr with the opclass it will use.

Once we have the link, clauselist_selectivity() can trivially compare
opclasses for both OpExprs, then retrieve other information for that
opclass for various purposes.

Seems lots of work for such a corner case, but would be worth it if this
solves other problems as well.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message William ZHANG 2006-03-16 14:38:36 Re: Bug report form: locale/encoding
Previous Message Simon Riggs 2006-03-16 10:19:44 Re: Pregunta sobre limitar el

Browse pgsql-performance by date

  From Date Subject
Next Message Guillaume Smet 2006-03-16 12:28:11 Re: PostgreSQL and Xeon MP
Previous Message Richard Huxton 2006-03-16 11:21:47 Re: PostgreSQL and Xeon MP