| From: | Alex Reece <awreece(at)gmail(dot)com> | 
|---|---|
| To: | "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org> | 
| Subject: | Different plan chosen when in lateral subquery | 
| Date: | 2017-12-05 18:04:27 | 
| Message-ID: | CANywC6CivtmFAaYKSnV+6oFmQWi0Cq=ycwfTzV4TK4TxKptcgA@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
I get very different plan chosen when my query is in a lateral subquery vs
standalone -- it doesn't use a key when joining on a table, instead opting
to do a hash join. Here is the query:
select distinct on (sub.entity_id, sub.note_id, sub.series_id)
       entity_id, note_id, series_id
from
(
select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount,
inv.name
from public.portfolio_allocations alloc
JOIN contributions contrib on contrib.id = alloc.note_id
JOIN investments inv on inv.id = contrib.investment_id
where entity_id = '\x5787f132f50f7b03002cf835' and
alloc.allocated_on <= dates.date
) sub
And wrapped inside the lateral:
        explain analyze
        select *
        from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,
    current_timestamp::TIMESTAMP + INTERVAL '1 day', '24 hours') dates,
        LATERAL (
         ... <SUB QUERY HERE> ...
        ) lat
Run by itself injecting a hard coded value for dates.date, I get the
expected plan which uses a key index on contributions:
      Unique  (cost=14.54..14.54 rows=2 width=39) (actual time=0.052..0.053
rows=2 loops=1)
         ->  Sort  (cost=14.54..14.54 rows=2 width=39) (actual
time=0.052..0.052 rows=2 loops=1)
               Sort Key: alloc.note_id, alloc.series_id
               Sort Method: quicksort  Memory: 25kB
               ->  Nested Loop  (cost=0.25..14.53 rows=2 width=39) (actual
time=0.030..0.042 rows=2      loops=1)
                     ->  Nested Loop  (cost=0.17..14.23 rows=2 width=52)
(actual time=0.022..0.028       rows=2 loops=1)
                           ->  Index Scan using
portfolio_allocations_entity_id_allocated_on_idx on
portfolio_allocations alloc  (cost=0.09..6.05 rows=2 width=39) (actual
 time=0.012..0.014
                                 Index Cond: ((entity_id =
'\x5787f132f50f7b03002cf835'::bytea) AND      (allocated_on <= '2017-03-14
20:59:59.999+00'::timestamp with time   zone))
                           ->  Index Scan using
contributions_id_accrue_from_idx on contributions     contrib
(cost=0.08..4.09 rows=1 width=26) (actual time=0.005..0.005    rows=1
loops=2)
                                 Index Cond: (id = alloc.note_id)
                     ->  Index Only Scan using investments_pkey on
investments inv  (     cost=0.08..0.15 rows=1 width=13) (actual
time=0.005..0.006 rows=1 loops=2)
                           Index Cond: (id = contrib.investment_id)
                           Heap Fetches: 2
       Planning time: 0.617 ms
       Execution time: 0.100 ms
      (15 rows)
But run in the lateral, it doesn't use the index:
       Nested Loop  (cost=14.54..24.55 rows=2000 width=47) (actual
time=0.085..0.219 rows=534      loops=1)
         ->  Function Scan on generate_series dates  (cost=0.00..3.00
rows=1000 width=8) (actual      time=0.031..0.043 rows=267 loops=1)
         ->  Materialize  (cost=14.54..14.55 rows=2 width=39) (actual
time=0.000..0.000 rows=2     loops=267)
               ->  Unique  (cost=14.54..14.54 rows=2 width=39) (actual
time=0.052..0.053 rows=2       loops=1)
                     ->  Sort  (cost=14.54..14.54 rows=2 width=39) (actual
time=0.051..0.052 rows=2      loops=1)
                           Sort Key: alloc.note_id, alloc.series_id
                           Sort Method: quicksort  Memory: 25kB
                           ->  Nested Loop  (cost=0.25..14.53 rows=2
width=39) (actual       time=0.029..0.041 rows=2 loops=1)
                                 ->  Nested Loop  (cost=0.17..14.23 rows=2
width=52) (actual       time=0.021..0.027 rows=2 loops=1)
                                       ->  Index Scan using
 portfolio_allocations_entity_id_allocated_on_idx on
portfolio_allocations alloc  (cost=0.09..6.05 rows=2     width=39) (actual
time=0
                                             Index Cond: ((entity_id =
 '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on      <= '2017-03-14
20:59:59.999+00'::timestamp with time    zone))
                                       ->  Index Scan using
contributions_id_accrue_from_idx on       contributions contrib
(cost=0.08..4.09 rows=1 width=26) (     actual time=0.005..0.005 rows=1 loo
                                             Index Cond: (id =
alloc.note_id)
                                 ->  Index Only Scan using investments_pkey
on investments inv  (     cost=0.08..0.15 rows=1 width=13) (actual
time=0.005..0.006 rows=1    loops=2)
                                       Index Cond: (id =
contrib.investment_id)
                                       Heap Fetches: 2
       Planning time: 0.718 ms
       Execution time: 0.296 ms
      (18 rows)
For reference, here are the indexes on the relevant tables:
Indexes:
    "portfolio_allocations_entity_id_allocated_on_idx" btree (entity_id,
allocated_on DESC)
    "portfolio_allocations_note_id_allocated_on_idx" btree (note_id,
allocated_on DESC)
    "portfolio_allocations_pnsa" btree (entity_id, note_id, series_id,
allocated_on DESC)
Indexes:
    "contributions_pkey" PRIMARY KEY, btree (id)
    "contributions_id_accrue_from_idx" btree (id,
events_earnings_accrue_from)
I have a few questions here:
  - Why doesn't it use the primary key index in either case?
  - Why isn't it choosing portfolio_allocations_pnsa, which seems like it
would prevent it from having to sort?
Best,
~Alex
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alex Reece | 2017-12-05 18:08:35 | Re: Different plan chosen when in lateral subquery | 
| Previous Message | Alvaro Herrera | 2017-12-05 17:49:14 | Re: Extremely slow DELETE with cascade foreign keys |