From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Venky Kandaswamy <venky(at)adchemy(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>, "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: [SQL] Curious problem of using BETWEEN with start and end being the same versus EQUALS '=' |
Date: | 2013-01-15 22:30:18 |
Message-ID: | 4583.1358289018@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-sql |
Venky Kandaswamy <venky(at)adchemy(dot)com> writes:
> On 9.1, I am running into a curious issue.
It's not very curious at all, or at least people on pgsql-performance
(the right list for this sort of question) would have figured it out
quickly. You're getting a crummy plan because of a crummy row estimate.
When you do this:
> WHERE a.date_id = 20120228
you get this:
> " -> Index Scan using alps_agg_date_id on bi2003.alps_agg a (cost=0.00..17870.00 rows=26292 width=1350) (actual time=0.047..142.383 rows=36132 loops=1)"
> " Output: a.date_id, a.page_group, a.page, a.int_alloc_type, a.componentset, a.adc_visit, upper((a.adc_visit)::text)"
> " Index Cond: (a.date_id = 20120228)"
> " Filter: ((a.page)::text = 'ddi_671'::text)"
26K estimated rows versus 36K actual isn't the greatest estimate in the
world, but it's plenty good enough. But when you do this:
> WHERE a.date_id BETWEEN 20120228 AND 20120228
you get this:
> " -> Index Scan using alps_agg_date_id on bi2003.alps_agg a (cost=0.00..10.12 rows=1 width=1350)"
> " Output: a.date_id, a.adc_visit, a.page_group, a.page, a.int_alloc_type, a.componentset, a.variation_tagset, a.page_instance"
> " Index Cond: ((a.date_id >= 20120228) AND (a.date_id <= 20120228))"
> " Filter: ((a.page)::text = 'ddi_671'::text)"
so the bogus estimate of only one row causes the planner to pick an
entirely different plan, which would probably be a great choice if there
were indeed only one such row, but with 36000 of them it's horrid.
The reason the row estimate is so crummy is that a zero-width interval
is an edge case for range estimates. We've seen this before, although
usually it's not quite this bad.
There's been some talk of making the estimate for "x >= a AND x <= b"
always be at least as much as the estimate for "x = a", but this would
increase the cost of making the estimate by quite a bit, and make things
actually worse in some cases (in particular, if a > b then a nil
estimate is indeed the right thing).
You might look into whether queries formed like "date_id >= 20120228 AND
date_id < 20120229" give you more robust estimates at the edge cases.
BTW, I notice in your EXPLAIN results that the same range restriction
has been propagated to b.date_id:
> " -> Index Scan using event_agg_date_id on bi2003.event_agg b (cost=0.00..10.27 rows=1 width=1694)"
> " Output: b.date_id, b.vcset, b.eventcountset, b.eventvalueset"
> " Index Cond: ((b.date_id >= 20120228) AND (b.date_id <= 20120228))"
I'd expect that to happen automatically for a simple equality
constraint, but not for a range constraint. Did you do that manually
and not tell us about it?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Venky Kandaswamy | 2013-01-15 22:44:50 | Re: [SQL] Curious problem of using BETWEEN with start and end being the same versus EQUALS '=' |
Previous Message | Tom Lane | 2013-01-15 21:32:48 | Re: 9.2 upgrade glitch with search_path |
From | Date | Subject | |
---|---|---|---|
Next Message | Venky Kandaswamy | 2013-01-15 22:44:50 | Re: [SQL] Curious problem of using BETWEEN with start and end being the same versus EQUALS '=' |
Previous Message | Pavel Stehule | 2013-01-15 18:23:13 | Re: returning values from dynamic SQL to a variable |