From: | Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> |
---|---|
To: | Keith Worthington <keithw(at)narrowpathinc(dot)com> |
Cc: | PostgreSQL Novice <pgsql-novice(at)postgresql(dot)org> |
Subject: | Re: Expensive where clause |
Date: | 2005-02-19 15:11:56 |
Message-ID: | 20050219063547.C73677@megazone.bigpanda.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
On Sat, 19 Feb 2005, Keith Worthington wrote:
> On Fri, 18 Feb 2005 21:55:29 -0800 (PST), Stephan Szabo wrote
> > On Fri, 18 Feb 2005, Keith Worthington wrote:
> >
> > > I have been working on a SQL statement that contains a WHERE
> > > clause of the form WHERE column1 > column2. The query runs
> > > pretty quickly (285ms) without the WHERE clause but slows to
> > > a relative crawl (5850ms) when it is included.
> > > Any suggestions on how to improve the performance would be
> > > greatly appreciated.
> >
> > Explain analyze output for the query with and without the clause
> > would probably be useful for analysis.
>
> Here is the explain analyze output with the WHERE
> clause commented out. This one actually ran slow. Usually it is only a few
> hundred ms without the WHERE clause.
That's probably just the instrumentation.
I'm not 100% sure why it's changing plans although I wonder if the costs
are just close enough that small changes are causing the plan change, but
I think it wouldn't pick a nested loop if it knew that it was grossly
underestimating the number of loops. It might be interesting to see how
the second query runs in explain analyze with enable_nestloop=off although
that'll likely make lower portions of the query more expensive.
One thing that jumps out at me is scans like this:
-> Index Scan using idx_tbl_item_item_type,
idx_tbl_item_item_type on tbl_item items (cost=0.00..84.16 rows=1
width=24)
(actual time=0.093..5.702 rows=566 loops=1)
Index Cond: (((item_type)::text =
'DIR'::text) OR ((item_type)::text = 'NET'::text))
Filter: ((NOT inactive) AND (item_class = 1))
This misestimation may be playing a part in why it thinks a nested loop is
a good plan. Is there a strong correlation between some item_types and
item_class or inactive? What does explain analyze on the following
queries show?
select * from tbl_item where (item_type='DIR OR item_type='NET');
select * from tbl_item where (item_type='DIR OR item_type='NET') AND NOT
inactive;
select * from tbl_item where (item_type='DIR OR item_type='NET') AND
item_class=1;
select * from tbl_item where (item_type='DIR OR item_type='NET') AND NOT
inactive AND item_class=1;
I'm not sure if you'd get any win from a partial index with WHERE NOT
inactive (or possibly both NOT inactive AND item_class=1 if you're almost
always limiting item_class to 1), but that might also be something to
check.
From | Date | Subject | |
---|---|---|---|
Next Message | A Gilmore | 2005-02-21 01:32:48 | Concatenate rows |
Previous Message | Keith Worthington | 2005-02-19 06:40:45 | Re: Expensive where clause |