Re: Very poor performance with Nested Loop Anti Join

From: Andreas Joseph Krogh <andreas(at)visena(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Very poor performance with Nested Loop Anti Join
Date: 2016-08-02 11:30:13
Message-ID: VisenaEmail.36.dfb69f223c50e31c.1564a108885@tc7-visena
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

På tirsdag 02. august 2016 kl. 01:15:05, skrev Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
<mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>>:
Andreas Joseph Krogh <andreas(at)visena(dot)com> writes:
> This query performs terribly slow (~26 minutes,��1561346.597ms):

Seems like the key misestimation is on the inner antijoin:

>                ->  Hash Anti Join  (cost=654.21..4008.72 rows=1 width=8)
(actual time=9.016..40.672 rows=76174 loops=1)
>                      Hash Cond: (il.invoice_id = creditnote.credit_against)
>                      ->  Seq Scan on onp_crm_invoice_line il 
(cost=0.00..3062.01 rows=78001 width=8) (actual time=0.005..11.259 rows=78614
loops=1)
>                      ->  Hash  (cost=510.56..510.56 rows=11492 width=8)
(actual time=8.940..8.940 rows=372 loops=1)
>                            Buckets: 16384  Batches: 1  Memory Usage: 143kB
>                            ->  Seq Scan on onp_crm_invoice creditnote 
(cost=0.00..510.56 rows=11492 width=8) (actual time=0.014..7.882 rows=11507
loops=1)
>                                  Filter: ((sent_date <= '2016-06-27'::date)
AND ((status_key)::text = 'INVOICE_STATUS_INVOICED'::text))
>                                  Rows Removed by Filter: 149

If it realized that this produces 78k rows not 1, it'd likely do something
smarter at the outer antijoin.

I have no idea why that estimate's so far off though.  What PG version is
this?  Stats all up to date on these two tables?
 
Sorry for not providing PG-version, this is on 9.5.3.
All stats are up to date, or should be a I've analyzed all manually.

 
Are the rows excluded
by the filter condition on "creditnote" significantly different from the
rest of that table?
 
This happens also without the filter-cond:
 
explain analyze SELECT log.relation_id as company_id , sum(log.duration) AS
durationFROM onp_crm_activity_log log JOIN onp_crm_person logfor ON
logfor.onp_user_id =log.logged_for AND logfor.is_resource = FALSE WHERE 1 = 1
-- Filter out already invoiced AND NOT EXISTS( SELECT * FROM
onp_crm_calendarentry_invoice_membership cemJOIN onp_crm_invoice_line il ON
cem.invoice_line_id = il.idJOIN onp_crm_invoice inv ON il.invoice_id =
inv.entity_idWHERE cem.calendar_entry_id = log.id AND NOT EXISTS( SELECT * FROM
onp_crm_invoice creditnoteWHERE il.invoice_id = creditnote.credit_against ) )
GROUP BY log.relation_id ;
 
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=12049.35..12067.11 rows=1421 width=12) (actual time
=1386683.646..1386683.858rows=720 loops=1) Group Key: log.relation_id -> Nested
LoopAnti Join (cost=512.08..12039.32 rows=2006 width=12) (actual time
=395.017..1386576.756rows=93480 loops=1) Join Filter: (cem.calendar_entry_id =
log.id) Rows Removed by Join Filter: 12185913244 -> Hash Join (cost
=86.56..9757.61rows=2006 width=20) (actual time=0.165..366.778 rows=181872
loops=1) Hash Cond: (log.logged_for = logfor.onp_user_id) -> Seq Scan on
onp_crm_activity_loglog (cost=0.00..8957.45 rows=184945 width=24) (actual time
=0.003..256.862rows=182606 loops=1) -> Hash (cost=39.46..39.46 rows=3768 width=8
) (actualtime=0.132..0.132 rows=36 loops=1) Buckets: 4096 Batches: 1 Memory
Usage: 34kB -> Bitmap Heap Scan on onp_crm_person logfor (cost=3.69..39.46 rows=
3768width=8) (actual time=0.033..0.125 rows=36 loops=1) Recheck Cond:
(onp_user_idIS NOT NULL) Filter: (NOT is_resource) Rows Removed by Filter: 5
Heap Blocks: exact=10 -> Bitmap Index Scan on onp_crm_person_onp_id_idx (cost
=0.00..2.75rows=41 width=0) (actual time=0.017..0.017 rows=41 loops=1) ->
Materialize (cost=425.53..2251.62 rows=1 width=4) (actual time=0.000..2.544 rows
=67003 loops=181872) -> Nested Loop (cost=425.53..2251.61 rows=1 width=4)
(actualtime=3.283..320.057 rows=88511 loops=1) -> Nested Loop (cost
=425.24..2251.30rows=1 width=8) (actual time=3.241..154.783 rows=88511 loops=1)
->Hash Anti Join (cost=424.95..2250.75 rows=1 width=8) (actual time
=3.110..30.097rows=76281 loops=1) Hash Cond: (il.invoice_id =
creditnote.credit_against) ->Index Only Scan using
origo_invoice_line_id_invoice_idxon onp_crm_invoice_line il (cost=0.29..1530.95
rows=78707 width=8) (actual time=0.030..13.719 rows=78740 loops=1) Heap Fetches:
2967 -> Hash (cost=278.22..278.22 rows=11715 width=8) (actual time=3.003..3.003
rows=376 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 143kB -> Index Only
Scanusing origo_invoice_credit_against_idx on onp_crm_invoice creditnote (cost
=0.29..278.22rows=11715 width=8) (actual time=0.042..2.082 rows=11692 loops=1)
Heap Fetches:1151 -> Index Only Scan using
onp_crm_calendarentry_invoice_invoice_line_id_calendar_entr_keyon
onp_crm_calendarentry_invoice_membership cem (cost=0.29..0.45 rows=9 width=8)
(actualtime=0.001..0.001 rows=1 loops=76281) Index Cond: (invoice_line_id =
il.id) Heap Fetches:4753 -> Index Only Scan using onp_crm_invoice_pkey on
onp_crm_invoice inv (cost=0.29..0.30 rows=1 width=8) (actual time=0.001..0.001
rows=1 loops=88511) Index Cond: (entity_id = il.invoice_id) Heap Fetches: 12084
Planningtime: 5.824 ms Execution time: 1386686.664 ms (35 rows)  
With set enable_nestloop to off;
 
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=13975.70..13993.46 rows=1421 width=12) (actual time
=338.185..338.341rows=720 loops=1) Group Key: log.relation_id -> Hash Anti Join
(cost=4265.19..13965.66 rows=2007 width=12) (actual time=147.696..318.314 rows=
93511loops=1) Hash Cond: (log.id = cem.calendar_entry_id) -> Hash Join (cost
=86.56..9761.69rows=2007 width=20) (actual time=0.166..127.604 rows=181915
loops=1) Hash Cond: (log.logged_for = logfor.onp_user_id) -> Seq Scan on
onp_crm_activity_loglog (cost=0.00..8961.23 rows=185023 width=24) (actual time
=0.006..84.093rows=182649 loops=1) -> Hash (cost=39.46..39.46 rows=3768 width=8
) (actualtime=0.123..0.123 rows=36 loops=1) Buckets: 4096 Batches: 1 Memory
Usage: 34kB -> Bitmap Heap Scan on onp_crm_person logfor (cost=3.69..39.46 rows=
3768width=8) (actual time=0.038..0.102 rows=36 loops=1) Recheck Cond:
(onp_user_idIS NOT NULL) Filter: (NOT is_resource) Rows Removed by Filter: 5
Heap Blocks: exact=10 -> Bitmap Index Scan on onp_crm_person_onp_id_idx (cost
=0.00..2.75rows=41 width=0) (actual time=0.019..0.019 rows=41 loops=1) -> Hash (
cost=4178.62..4178.62 rows=1 width=4) (actual time=147.497..147.497 rows=88523
loops=1) Buckets: 131072 (originally 1024) Batches: 1 (originally 1) Memory
Usage: 4137kB -> Hash Join (cost=2553.20..4178.62 rows=1 width=4) (actual time
=98.512..133.017rows=88523 loops=1) Hash Cond: (cem.invoice_line_id = il.id) ->
Seq Scanon onp_crm_calendarentry_invoice_membership cem (cost=0.00..1290.66 rows
=89266 width=8) (actual time=0.006..11.151 rows=89175 loops=1) -> Hash (cost
=2553.19..2553.19rows=1 width=4) (actual time=98.481..98.481 rows=76286 loops=1
) Buckets:131072 (originally 1024) Batches: 1 (originally 1) Memory Usage: 3706
kB -> MergeJoin (cost=2252.01..2553.19 rows=1 width=4) (actual time
=50.922..87.641rows=76286 loops=1) Merge Cond: (il.invoice_id = inv.entity_id)
-> Sort (cost=2251.73..2251.73 rows=1 width=8) (actual time=50.872..55.552 rows=
76286loops=1) Sort Key: il.invoice_id Sort Method: quicksort Memory: 6648kB ->
HashAnti Join (cost=425.91..2251.72 rows=1 width=8) (actual time=5.904..35.979
rows=76286 loops=1) Hash Cond: (il.invoice_id = creditnote.credit_against) ->
Index OnlyScan using origo_invoice_line_id_invoice_idx on onp_crm_invoice_line
il (cost=0.29..1530.95 rows=78707 width=8) (actual time=0.028..16.124 rows=78745
loops=1) Heap Fetches: 2972 -> Hash (cost=278.74..278.74 rows=11750 width=8)
(actualtime=5.792..5.792 rows=376 loops=1) Buckets: 16384 Batches: 1 Memory
Usage: 143kB -> Index Only Scan using origo_invoice_credit_against_idx on
onp_crm_invoice creditnote (cost=0.29..278.74 rows=11750 width=8) (actual time
=0.067..4.466rows=11694 loops=1) Heap Fetches: 1155 -> Index Only Scan using
onp_crm_invoice_pkeyon onp_crm_invoice inv (cost=0.29..272.09 rows=11750 width=8
) (actualtime=0.040..10.755 rows=76661 loops=1) Heap Fetches: 3840 Planning time
:3.762 ms Execution time: 339.634 ms (39 rows)
 
 

 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas(at)visena(dot)com <mailto:andreas(at)visena(dot)com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

 

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Luca Ferrari 2016-08-02 12:00:25 Re: Create language plperlu Error
Previous Message John R Pierce 2016-08-02 07:21:26 Re: Create language plperlu Error