Re: Different plan chosen when in lateral subquery

From: Alex Reece <awreece(at)gmail(dot)com>
To: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Different plan chosen when in lateral subquery
Date: 2017-12-05 18:08:35
Message-ID: CANywC6ADHY502_TQ0epf8GsQ8ieCaTgRXtsNAd-ujAEHYH+m9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Weird, when I deleted an erroneous index it started picking a reasonable
plan. This now works as expected, for posterity here is the bad plan:

Nested Loop (cost=21281.50..21323812.82 rows=5621000 width=47) (actual
time=171.648..7233.298 rows=85615 loops=1)

-> Function Scan on generate_series dates (cost=0.00..3.00 rows=1000
width=8) (actual time=0.031..0.252 rows=267 loops=1)

-> Unique (cost=21281.50..21290.08 rows=5621 width=39) (actual
time=25.730..27.050 rows=321 loops=267)

-> Sort (cost=21281.50..21284.36 rows=5724 width=39) (actual
time=25.728..26.242 rows=6713 loops=267)

Sort Key: alloc.note_id, alloc.series_id

Sort Method: quicksort Memory: 2220kB

-> Nested Loop (cost=10775.92..21210.05 rows=5724
width=39) (actual time=1.663..21.938 rows=6713 loops=267)

-> Hash Join (cost=10775.83..20355.61 rows=5724
width=52) (actual time=1.657..5.980 rows=6713 loops=267)

Hash Cond: (alloc.note_id = contrib.id)

-> Bitmap Heap Scan on portfolio_allocations
alloc (cost=69.82..9628.13 rows=5724 width=39) (actual time=1.010..2.278
rows=6713 loops=267)

Recheck Cond: ((entity_id =
'\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
date(dates.dates)))

Heap Blocks: exact=118074

-> Bitmap Index Scan on
portfolio_allocations_entity_id_allocated_on_idx (cost=0.00..69.53
rows=5724 width=0) (actual time=0.956..0.956 rows=6713 lo

Index Cond: ((entity_id =
'\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
date(dates.dates)))

-> Hash (cost=9464.85..9464.85 rows=354617
width=26) (actual time=169.792..169.792 rows=354617 loops=1)

Buckets: 524288 Batches: 1 Memory Usage:
24296kB

-> Seq Scan on contributions contrib
(cost=0.00..9464.85
rows=354617 width=26) (actual time=0.007..83.246 rows=354617 loops=1)

-> Index Only Scan using investments_pkey on
investments inv (cost=0.08..0.15 rows=1 width=13) (actual
time=0.002..0.002 rows=1 loops=1792457)

Index Cond: (id = contrib.investment_id)

Heap Fetches: 1792457

Planning time: 0.721 ms

Execution time: 7236.507 ms

On Tue, Dec 5, 2017 at 10:04 AM Alex Reece <awreece(at)gmail(dot)com> wrote:

> I get very different plan chosen when my query is in a lateral subquery vs
> standalone -- it doesn't use a key when joining on a table, instead opting
> to do a hash join. Here is the query:
>
> select distinct on (sub.entity_id, sub.note_id, sub.series_id)
> entity_id, note_id, series_id
> from
> (
> select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount,
> inv.name
> from public.portfolio_allocations alloc
> JOIN contributions contrib on contrib.id = alloc.note_id
> JOIN investments inv on inv.id = contrib.investment_id
> where entity_id = '\x5787f132f50f7b03002cf835' and
> alloc.allocated_on <= dates.date
> ) sub
>
> And wrapped inside the lateral:
>
> explain analyze
> select *
> from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,
> current_timestamp::TIMESTAMP + INTERVAL '1 day', '24 hours') dates,
> LATERAL (
> ... <SUB QUERY HERE> ...
> ) lat
>
> Run by itself injecting a hard coded value for dates.date, I get the
> expected plan which uses a key index on contributions:
>
> Unique (cost=14.54..14.54 rows=2 width=39) (actual
> time=0.052..0.053 rows=2 loops=1)
> -> Sort (cost=14.54..14.54 rows=2 width=39) (actual
> time=0.052..0.052 rows=2 loops=1)
> Sort Key: alloc.note_id, alloc.series_id
> Sort Method: quicksort Memory: 25kB
> -> Nested Loop (cost=0.25..14.53 rows=2 width=39) (actual
> time=0.030..0.042 rows=2 loops=1)
> -> Nested Loop (cost=0.17..14.23 rows=2 width=52)
> (actual time=0.022..0.028 rows=2 loops=1)
> -> Index Scan using
> portfolio_allocations_entity_id_allocated_on_idx on
> portfolio_allocations alloc (cost=0.09..6.05 rows=2 width=39) (actual
> time=0.012..0.014
> Index Cond: ((entity_id =
> '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= '2017-03-14
> 20:59:59.999+00'::timestamp with time zone))
> -> Index Scan using
> contributions_id_accrue_from_idx on contributions contrib
> (cost=0.08..4.09 rows=1 width=26) (actual time=0.005..0.005 rows=1
> loops=2)
> Index Cond: (id = alloc.note_id)
> -> Index Only Scan using investments_pkey on
> investments inv ( cost=0.08..0.15 rows=1 width=13) (actual
> time=0.005..0.006 rows=1 loops=2)
> Index Cond: (id = contrib.investment_id)
> Heap Fetches: 2
> Planning time: 0.617 ms
> Execution time: 0.100 ms
> (15 rows)
>
> But run in the lateral, it doesn't use the index:
>
> Nested Loop (cost=14.54..24.55 rows=2000 width=47) (actual
> time=0.085..0.219 rows=534 loops=1)
> -> Function Scan on generate_series dates (cost=0.00..3.00
> rows=1000 width=8) (actual time=0.031..0.043 rows=267 loops=1)
> -> Materialize (cost=14.54..14.55 rows=2 width=39) (actual
> time=0.000..0.000 rows=2 loops=267)
> -> Unique (cost=14.54..14.54 rows=2 width=39) (actual
> time=0.052..0.053 rows=2 loops=1)
> -> Sort (cost=14.54..14.54 rows=2 width=39) (actual
> time=0.051..0.052 rows=2 loops=1)
> Sort Key: alloc.note_id, alloc.series_id
> Sort Method: quicksort Memory: 25kB
> -> Nested Loop (cost=0.25..14.53 rows=2
> width=39) (actual time=0.029..0.041 rows=2 loops=1)
> -> Nested Loop (cost=0.17..14.23 rows=2
> width=52) (actual time=0.021..0.027 rows=2 loops=1)
> -> Index Scan using
> portfolio_allocations_entity_id_allocated_on_idx on
> portfolio_allocations alloc (cost=0.09..6.05 rows=2 width=39) (actual
> time=0
> Index Cond: ((entity_id =
> '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= '2017-03-14
> 20:59:59.999+00'::timestamp with time zone))
> -> Index Scan using
> contributions_id_accrue_from_idx on contributions contrib
> (cost=0.08..4.09 rows=1 width=26) ( actual time=0.005..0.005 rows=1 loo
> Index Cond: (id =
> alloc.note_id)
> -> Index Only Scan using
> investments_pkey on investments inv ( cost=0.08..0.15 rows=1 width=13)
> (actual time=0.005..0.006 rows=1 loops=2)
> Index Cond: (id =
> contrib.investment_id)
> Heap Fetches: 2
> Planning time: 0.718 ms
> Execution time: 0.296 ms
> (18 rows)
>
> For reference, here are the indexes on the relevant tables:
>
> Indexes:
> "portfolio_allocations_entity_id_allocated_on_idx" btree (entity_id,
> allocated_on DESC)
> "portfolio_allocations_note_id_allocated_on_idx" btree (note_id,
> allocated_on DESC)
> "portfolio_allocations_pnsa" btree (entity_id, note_id, series_id,
> allocated_on DESC)
>
> Indexes:
> "contributions_pkey" PRIMARY KEY, btree (id)
> "contributions_id_accrue_from_idx" btree (id,
> events_earnings_accrue_from)
>
> I have a few questions here:
> - Why doesn't it use the primary key index in either case?
> - Why isn't it choosing portfolio_allocations_pnsa, which seems like it
> would prevent it from having to sort?
>
> Best,
> ~Alex
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rodrigo Rosenfeld Rosas 2017-12-05 18:15:14 Re: Extremely slow DELETE with cascade foreign keys
Previous Message Alex Reece 2017-12-05 18:04:27 Different plan chosen when in lateral subquery