Quick Links

Re: BETWEEN optimizer problems with single-value

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org, Andreas Kretschmer <akretschmer(at)spamfence(dot)net>
Subject:	Re: BETWEEN optimizer problems with single-value
Date:	2006-03-16 02:05:24
Message-ID:	11428.1142474724@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>> ISTM that when the BETWEEN constants match we end up in this part of
>> clauselist_selectivity()...

Yeah, I think you are right.

> so that the planner underestimates the cost of using "Cal_CalDate" so
> that it ends up the same as "Cal_CtofcNo", and then we pick
> "Cal_CalDate" because it was created first.

No, it doesn't end up the same --- but the difference is small enough to
be in the roundoff-error regime. The real issue here is that we're
effectively assuming that one row will be fetched from the index in both
cases, and this is clearly not the case for the Cal_CalDate index. So
we need a more accurate estimate for the boundary case.

> Using 1.0e-10 isn't very useful... the selectivity for a range should
> never be less than the selectivity for an equality, so we should simply
> put in a test against one of the pseudo constants and use that as the
> minimal value.

That's easier said than done, because you'd first have to find the
appropriate equality operator to use (ie, one having semantics that
agree with the inequality operators). Another point is that the above
statement is simply wrong, consider
calDate BETWEEN '2006-03-15' AND '2006-03-14'
for which an estimate of zero really is correct.

Possibly we could drop this code's reliance on seeing
SCALARLTSEL/SCALARGTSEL as the estimators, and instead try to locate a
common btree opclass for the operators --- which would then let us
identify the right equality operator to use, and also let us distinguish
> from >= etc. If we're trying to get the boundary cases right I
suspect we have to account for that. I could see such an approach being
tremendously slow though :-(, because we'd go looking for btree
opclasses even for operators that have nothing to do with < or >.

regards, tom lane

In response to

Re: BETWEEN optimizer problems with single-value at 2006-03-16 00:24:38 from Simon Riggs

Responses

Re: BETWEEN optimizer problems with single-value at 2006-03-16 11:53:52 from Simon Riggs

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2006-03-16 02:55:36	Re: Static build of psql with readline support
Previous Message	Christopher Kings-Lynne	2006-03-16 01:27:19	Static build of psql with readline support

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Guillaume Smet	2006-03-16 10:45:12	PostgreSQL and Xeon MP
Previous Message	Jan de Visser	2006-03-16 00:42:00	Re: Slow SELECTS after large update cycle