Quick Links

Re: [SQL] Curious problem of using BETWEEN with start and end being the same versus EQUALS '='

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Venky Kandaswamy <venky(at)adchemy(dot)com>
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>, "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org>
Subject:	Re: [SQL] Curious problem of using BETWEEN with start and end being the same versus EQUALS '='
Date:	2013-01-15 22:30:18
Message-ID:	4583.1358289018@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-sql

Venky Kandaswamy <venky(at)adchemy(dot)com> writes:
> On 9.1, I am running into a curious issue.

It's not very curious at all, or at least people on pgsql-performance
(the right list for this sort of question) would have figured it out
quickly. You're getting a crummy plan because of a crummy row estimate.
When you do this:

> WHERE a.date_id = 20120228

you get this:

> " -> Index Scan using alps_agg_date_id on bi2003.alps_agg a (cost=0.00..17870.00 rows=26292 width=1350) (actual time=0.047..142.383 rows=36132 loops=1)"
> " Output: a.date_id, a.page_group, a.page, a.int_alloc_type, a.componentset, a.adc_visit, upper((a.adc_visit)::text)"
> " Index Cond: (a.date_id = 20120228)"
> " Filter: ((a.page)::text = 'ddi_671'::text)"

26K estimated rows versus 36K actual isn't the greatest estimate in the
world, but it's plenty good enough. But when you do this:

> WHERE a.date_id BETWEEN 20120228 AND 20120228

you get this:

> " -> Index Scan using alps_agg_date_id on bi2003.alps_agg a (cost=0.00..10.12 rows=1 width=1350)"
> " Output: a.date_id, a.adc_visit, a.page_group, a.page, a.int_alloc_type, a.componentset, a.variation_tagset, a.page_instance"
> " Index Cond: ((a.date_id >= 20120228) AND (a.date_id <= 20120228))"
> " Filter: ((a.page)::text = 'ddi_671'::text)"

so the bogus estimate of only one row causes the planner to pick an
entirely different plan, which would probably be a great choice if there
were indeed only one such row, but with 36000 of them it's horrid.

The reason the row estimate is so crummy is that a zero-width interval
is an edge case for range estimates. We've seen this before, although
usually it's not quite this bad.

There's been some talk of making the estimate for "x >= a AND x <= b"
always be at least as much as the estimate for "x = a", but this would
increase the cost of making the estimate by quite a bit, and make things
actually worse in some cases (in particular, if a > b then a nil
estimate is indeed the right thing).

You might look into whether queries formed like "date_id >= 20120228 AND
date_id < 20120229" give you more robust estimates at the edge cases.

BTW, I notice in your EXPLAIN results that the same range restriction
has been propagated to b.date_id:

> " -> Index Scan using event_agg_date_id on bi2003.event_agg b (cost=0.00..10.27 rows=1 width=1694)"
> " Output: b.date_id, b.vcset, b.eventcountset, b.eventvalueset"
> " Index Cond: ((b.date_id >= 20120228) AND (b.date_id <= 20120228))"

I'd expect that to happen automatically for a simple equality
constraint, but not for a range constraint. Did you do that manually
and not tell us about it?

regards, tom lane

In response to

Curious problem of using BETWEEN with start and end being the same versus EQUALS '=' at 2013-01-15 18:18:17 from Venky Kandaswamy

Responses

Re: [SQL] Curious problem of using BETWEEN with start and end being the same versus EQUALS '=' at 2013-01-15 22:44:50 from Venky Kandaswamy

Browse pgsql-general by date

	From	Date	Subject
Next Message	Venky Kandaswamy	2013-01-15 22:44:50	Re: [SQL] Curious problem of using BETWEEN with start and end being the same versus EQUALS '='
Previous Message	Tom Lane	2013-01-15 21:32:48	Re: 9.2 upgrade glitch with search_path

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Venky Kandaswamy	2013-01-15 22:44:50	Re: [SQL] Curious problem of using BETWEEN with start and end being the same versus EQUALS '='
Previous Message	Pavel Stehule	2013-01-15 18:23:13	Re: returning values from dynamic SQL to a variable