Re: slow joins?

From: Greg Williamson <gwilliamson39(at)yahoo(dot)com>
To:
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: slow joins?
Date: 2013-04-06 03:32:56
Message-ID: 1365219176.2155.YahooMailNeo@web125904.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Joe --

>________________________________
> From: Joe Van Dyk <joe(at)tanga(dot)com>
>To: Greg Williamson <gwilliamson39(at)yahoo(dot)com>
>Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
>Sent: Friday, April 5, 2013 7:56 PM
>Subject: Re: [PERFORM] slow joins?
>
>
>On Fri, Apr 5, 2013 at 6:54 PM, Greg Williamson <gwilliamson39(at)yahoo(dot)com> wrote:
>
>Joe --
>>
>>>________________________________
>>> From: Joe Van Dyk <joe(at)tanga(dot)com>
>>>To: pgsql-performance(at)postgresql(dot)org
>>>Sent: Friday, April 5, 2013 6:42 PM
>>>Subject: Re: [PERFORM] slow joins?
>>
>>>
>>>
>>>(https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txt shows a non-wrapped version of the queries and plan)
>>>
>>>
>>>
>>>
>>>On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe(at)tanga(dot)com> wrote:
>>>
>>>On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.
>>>>
>>>>
>>>>Any ideas on how I can speed this up?  I have btree indexes for all the columns used in the query.
>>>>
>>>>explain analyze                                                                                    
>>>>SELECT COUNT(*)                                                                                    
>>>>FROM purchased_items pi                                                                            
>>>>inner join line_items li on li.id = pi.line_item_id                                                
>>>>inner join products      on products.id = li.product_id                                            
>>>>WHERE products.drop_shipper_id = 221;
>>>>
>>>> Aggregate  (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
>>>>   ->  Hash Join  (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
>>>>         Hash Cond: (pi.line_item_id = li.id)
>>>>         ->  Seq Scan on purchased_items pi  (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
>>>>         ->  Hash  (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
>>>>               Buckets: 4096  Batches: 4  Memory Usage: 73kB
>>>>               ->  Hash Join  (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
>>>>                     Hash Cond: (li.product_id = products.id)
>>>>                     ->  Seq Scan on line_items li  (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
>>>>                     ->  Hash  (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
>>>>                           Buckets: 1024  Batches: 1  Memory Usage: 22kB
>>>>                           ->  Bitmap Heap Scan on products  (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
>>>>                                 Recheck Cond: (drop_shipper_id = 221)
>>>>                                 ->  Bitmap Index Scan on index_products_on_drop_shipper_id  (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
>>>>                                       Index Cond: (drop_shipper_id = 221)
>>>> Total runtime: 2425.302 ms
>>>>
>>>>
>>>>explain analyze                                                                                    
>>>>SELECT COUNT(*)                                                                                    
>>>>FROM purchased_items pi                                                                            
>>>>inner join line_items li on li.id = pi.line_item_id                                                
>>>>inner join products      on products.id = li.product_id                                            
>>>>WHERE products.drop_shipper_id = 2;                                                                
>>>>                                                                                                                     
>>>>
>>>> Aggregate  (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
>>>>   ->  Nested Loop  (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
>>>>         ->  Nested Loop  (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
>>>>               ->  Index Scan using index_products_on_drop_shipper_id on products  (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
>>>>                     Index Cond: (drop_shipper_id = 2)
>>>>               ->  Index Scan using index_line_items_on_product_id on line_items li  (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
>>>>                     Index Cond: (product_id = products.id)
>>>>         ->  Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi  (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
>>>>               Index Cond: (line_item_id = li.id)
>>>>               Heap Fetches: 5
>>>> Total runtime: 0.955 ms
>>>>(11 rows)
>>>>
>>>
>>
>>
>>Does drop_shipper+id have a much larger number of rows which is making the scanner want to avoid an indexed scan or otherwise prefer a sequential scan on products and on line_items ?
>>
>
>
>Assuming you mean products.drop_shipper_id? There are more rows matched for the first one vs the second one. 
>70 products rows match drop_shipper_id=2, 618 match drop_shipper_id=221.

>What are the stats settings for these tables ?
>>
>
>
>Whatever the defaults are.

I mis-pasted the tables -- both line_items and purchased items are getting sequential scans for the relevant rows; it is possible that that there's enough difference to tip the planner to use sequential scans.

You might try increasing the stats being collected on those two tables, run analyze on all the tables in the query, and try it again.

GW

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Franck Routier 2013-04-06 08:02:39 Re: What happens between end of explain analyze and end of query execution ?
Previous Message Joe Van Dyk 2013-04-06 02:56:02 Re: slow joins?