From: | Alexander Staubo <alex(at)bengler(dot)no> |
---|---|
To: | David Wilson <david(dot)t(dot)wilson(at)gmail(dot)com> |
Cc: | pgsql-performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Bad plan for nested loop + limit |
Date: | 2009-02-27 20:18:42 |
Message-ID: | 88daf38c0902271218p1931ec9dyfdc269bceab24190@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Sun, Feb 15, 2009 at 5:45 PM, Alexander Staubo <alex(at)bengler(dot)no> wrote:
> On Sun, Feb 15, 2009 at 5:29 AM, David Wilson <david(dot)t(dot)wilson(at)gmail(dot)com> wrote:
>> On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo <alex(at)bengler(dot)no> wrote:
>>>
>>> Output from "explain analyze":
>>>
>>> Limit (cost=0.00..973.63 rows=4 width=48) (actual
>>> time=61.554..4039.704 rows=1 loops=1)
>>> -> Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual
>>> time=61.552..4039.700 rows=1 loops=1)
>>> -> Nested Loop (cost=0.00..68247.77 rows=297 width=52)
>>> (actual time=61.535..4039.682 rows=1 loops=1)
>>
>> Those estimates are pretty far off. Did you try increasing the
>> statistics target? Also, is the first query repeatable (that is, is it
>> already in cache when you do the test, or alternately, are all queries
>> *out* of cache when you test?)
All right, this query keeps coming back to bite me. If this part of the join:
... and section_items.sandbox_id = 16399
yields a sufficiently large number of matches, then performance goes
'boink', like so:
Limit (cost=0.00..34.86 rows=4 width=48) (actual
time=4348.696..4348.696 rows=0 loops=1)
-> Nested Loop (cost=0.00..60521.56 rows=6944 width=48) (actual
time=4348.695..4348.695 rows=0 loops=1)
-> Index Scan using index_event_occurrences_on_start_time on
event_occurrences (cost=0.00..11965.38 rows=145712 width=48) (actual
time=0.093..138.029 rows=145108 loops=1)
Index Cond: (start_time > '2009-02-27
18:01:14.739411+01'::timestamp with time zone)
-> Index Scan using
index_section_items_on_subject_type_and_subject_id on section_items
(cost=0.00..0.32 rows=1 width=4) (actual time=0.029..0.029 rows=0
loops=145108)
Index Cond: (((section_items.subject_type)::text =
'Event'::text) AND (section_items.subject_id =
event_occurrences.event_id))
Filter: (section_items.sandbox_id = 9)
Total runtime: 4348.777 ms
In this case:
# select count(*) from section_items where sandbox_id = 9;
count
-------
3126
If I remove the start_time > ... clause, performance is fine. Upping
the statistics setting on any of the columns involved seems to have no
effect.
Is this a pathological border case, or is there something I can do to
*generally* make this query run fast? Keep in mind that the query
itself returns no rows at all. I want to avoid doing an initial
"select count(...)" just to avoid the bad plan. Suffice to say, having
a web request take 5 seconds is asking too much from our users.
Alexander.
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2009-02-27 22:54:47 | Re: Bad plan for nested loop + limit |
Previous Message | Scott Marlowe | 2009-02-26 19:16:54 | Re: Abnormal performance difference between Postgres and MySQL |