Re: [HACKERS] BETWEEN optimizer problems with single-value

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] BETWEEN optimizer problems with single-value
Date: 2006-03-15 23:34:33
Message-ID: 44185029.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

>>> On Wed, Mar 15, 2006 at 5:05 pm, in message
<1142463908(dot)3859(dot)188(dot)camel(at)localhost(dot)localdomain>, Simon Riggs
<simon(at)2ndquadrant(dot)com> wrote:
> On Wed, 2006- 03- 15 at 11:56 - 0600, Kevin Grittner wrote:
>
>> (One obvious way to fix it would be to
>> rewrite "BETWEEN a AND b" as "= a" when a is equal to b, but it
seems
>> like there is some underlying problem which should be fixed instead
(or
>> in addition to) this.
>
> That might work, but I'm not sure if that is in itself the problem
and
> it would be mostly wasted overhead in 99% of cases.

It sounds like we agree.

> The main issue appears to be that the planner chooses "Cal_CalDate"
> index rather than "Cal_CtofcNo" index when the BETWEEN values match.

Agreed.

> It seems that the cost of the first and third EXPLAINs is equal, yet
for
> some reason it chooses different indexes in each case. My
understanding
> was that it would pick the first index created if plan costs were
equal.
> Is that behaviour repeatable with each query?

It seems to be a consistent pattern, although strictly speaking our
evidence is anecdotal. We've got hundreds of known failures with the
BETWEEN variant on equal dates and no known successes. We have a few
dozen tests of the equality variant with 100% success in those tests.

> ISTM that if we have equal plan costs then we should be choosing the
> index for which we have more leading columns, since that is more
likely
> to lead to a more selective answer. But the plan selection is a
simple
> "pick the best, or if they're equal pick the best sort order".

> The selectivity seems the same in both - clamped to a minimum of 1
row,
> so changing that doesn't look like it would help.

The fact that it costs these as equivalent is surprising in itself, and
might be worth examining. This might be an example of something I
suggested a while ago -- that the rounding a row estimate to an integer
on the basis that "you can't read half a row" is not necessarily wise,
because you can have a 50% chance of reading a row versus a higher or
lower percentage.

-Kevin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2006-03-16 00:07:06 Re: BETWEEN optimizer problems with single-value
Previous Message Simon Riggs 2006-03-15 23:05:08 Re: [HACKERS] BETWEEN optimizer problems with single-value range

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2006-03-16 00:07:06 Re: BETWEEN optimizer problems with single-value
Previous Message Simon Riggs 2006-03-15 23:21:27 Re: Slow SELECTS after large update cycle