Re: Analyzing performance regression from 9.2 to 9.6

From: Dave Peticolas <dave(at)krondo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Analyzing performance regression from 9.2 to 9.6
Date: 2017-09-10 04:41:51
Message-ID: CAPRbp07yGdroT_1Fv5R4rsrgW4QD7QmV-DQvUcFjEhfDsFy2Kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> Am I right in guessing that report_submission.id is a
> declared-not-null column, so that the join FROM "report_skilltype"
> should be understood as an anti-join?

You are exactly right.

> Any chance of whacking your ORM upside the head to the point where it
would emit that?

In this case it turns out to be pretty simple to do and it makes all the
difference.
Thank you so much, this has been very enlightening. New plans for 9.6 and
9.2 below.

Limit (cost=203873.66..203873.87 rows=86 width=66) (actual
time=2776.352..2776.352 rows=0 loops=1)
-> Sort (cost=203873.66..204380.31 rows=202660 width=66) (actual
time=2776.350..2776.350 rows=0 loops=1)
Sort Key: report_typereport.id DESC
Sort Method: quicksort Memory: 25kB
-> Hash Right Join (cost=157720.28..196348.63 rows=202660
width=66) (actual time=2776.343..2776.343 rows=0 loops=1)
Hash Cond: (report_typereport.skill_type_id =
report_skilltype.uuid)
-> Seq Scan on report_typereport (cost=0.00..34513.92
rows=824892 width=20) (never executed)
-> Hash (cost=155187.03..155187.03 rows=202660 width=62)
(actual time=2775.983..2775.983 rows=0 loops=1)
Buckets: 262144 Batches: 1 Memory Usage: 2048kB
-> Hash Join (cost=81562.64..155187.03 rows=202660
width=62) (actual time=2775.982..2775.982 rows=0 loops=1)
Hash Cond: (report_type.id =
report_skilltype.type_id)
-> Seq Scan on report_type
(cost=0.00..55456.48 rows=843731 width=4) (actual time=0.013..496.357
rows=847827 loops=1)
Filter: ((NOT is_current) AND (category <>
ALL ('{7,8}'::integer[])))
Rows Removed by Filter: 809331
-> Hash (cost=72311.12..72311.12 rows=398041
width=62) (actual time=1998.763..1998.763 rows=463108 loops=1)
Buckets: 524288 Batches: 2 Memory Usage:
25322kB
-> Hash Anti Join
(cost=16689.46..72311.12 rows=398041 width=62) (actual
time=252.111..1815.212 rows=463108 loops=1)
Hash Cond: (report_skilltype.uuid =
report_submission.skill_type_id)
-> Hash Join (cost=75.89..49200.31
rows=566266 width=62) (actual time=1.342..1164.145 rows=950561 loops=1)
Hash Cond:
(report_skilltype.skill_id = kbdata_skill.id)
-> Seq Scan on
report_skilltype (cost=13.51..40399.71 rows=818608 width=62) (actual
time=0.232..661.263 rows=1637183 loops=1)
Filter: (NOT (hashed
SubPlan 1))
Rows Removed by Filter:
33
SubPlan 1
-> Hash Join
(cost=1.05..13.07 rows=176 width=4) (actual time=0.064..0.206 rows=26
loops=1)
Hash Cond:
(u2.product_id = u3.id)
-> Seq Scan on
product_kbdata_skills u2 (cost=0.00..8.28 rows=528 width=8) (actual
time=0.003..0.083 rows=528 loops=1)
-> Hash
(cost=1.04..1.04 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
Buckets:
1024 Batches: 1 Memory Usage: 9kB
-> Seq
Scan on product u3 (cost=0.00..1.04 rows=1 width=4) (actual
time=0.006..0.007 rows=1 loops=1)

Filter: ((slug)::text = 'product1'::text)
Rows
Removed by Filter: 2
-> Hash (cost=57.68..57.68
rows=376 width=12) (actual time=1.104..1.104 rows=374 loops=1)
Buckets: 1024 Batches:
1 Memory Usage: 25kB
-> Hash Join
(cost=32.00..57.68 rows=376 width=12) (actual time=0.600..0.977 rows=374
loops=1)
Hash Cond: (
kbdata_skill.id = product_kbdata_skills.skill_id)
-> Seq Scan on
kbdata_skill (cost=0.00..19.79 rows=542 width=4) (actual time=0.008..0.198
rows=542 loops=1)
Filter:
(external_uuid <> 'CE9EC8A7-80D8-42A2-B506-0C1B7248CF8C'::uuid)
Rows Removed
by Filter: 1
-> Hash
(cost=27.16..27.16 rows=387 width=8) (actual time=0.585..0.585 rows=374
loops=1)
Buckets:
1024 Batches: 1 Memory Usage: 23kB
-> Hash
Join (cost=11.71..27.16 rows=387 width=8) (actual time=0.193..0.483
rows=374 loops=1)
Hash
Cond: (product_kbdata_skills.skill_id = kbdata_skill_aspects.skill_id)
->
Seq Scan on product_kbdata_skills (cost=0.00..8.28 rows=528 width=4)
(actual time=0.003..0.107 rows=528 loops=1)

Filter: (product_id IS NOT NULL)
->
Hash (cost=6.87..6.87 rows=387 width=4) (actual time=0.183..0.183
rows=387 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 22kB

-> Seq Scan on kbdata_skill_aspects (cost=0.00..6.87 rows=387 width=4)
(actual time=0.002..0.096 rows=387 loops=1)

Filter: (aspect_id IS NOT NULL)
-> Hash (cost=10488.81..10488.81
rows=489981 width=16) (actual time=250.096..250.096 rows=489981 loops=1)
Buckets: 524288 Batches: 1
Memory Usage: 27064kB
-> Seq Scan on
report_submission (cost=0.00..10488.81 rows=489981 width=16) (actual
time=0.004..128.478 rows=489981 loops=1)
Planning time: 13.808 ms
Execution time: 2776.598 ms
(54 rows)

Limit (cost=220583.50..220583.71 rows=86 width=66) (actual
time=5012.934..5012.934 rows=0 loops=1)
-> Sort (cost=220583.50..221311.03 rows=291012 width=66) (actual
time=5012.933..5012.933 rows=0 loops=1)
Sort Key: report_typereport.id
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=168290.94..209777.84 rows=291012 width=66)
(actual time=5012.927..5012.927 rows=0 loops=1)
Hash Cond: (report_skilltype.skill_id = kbdata_skill.id)
-> Hash Right Join (cost=168228.56..206606.96 rows=291208
width=66) (actual time=3772.672..4954.449 rows=351045 loops=1)
Hash Cond: (report_typereport.skill_type_id =
report_skilltype.uuid)
-> Seq Scan on report_typereport
(cost=0.00..33918.40 rows=803840 width=20) (actual time=0.008..364.048
rows=805350 loops=1)
-> Hash (cost=164588.46..164588.46 rows=291208
width=62) (actual time=3772.568..3772.568 rows=349748 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 30074kB
-> Hash Anti Join (cost=85026.50..164588.46
rows=291208 width=62) (actual time=953.490..3608.328 rows=349748 loops=1)
Hash Cond: (report_skilltype.uuid =
report_submission.skill_type_id)
-> Hash Join (cost=68845.95..138820.85
rows=411043 width=62) (actual time=709.782..2667.964 rows=826198 loops=1)
Hash Cond: (report_skilltype.type_id
= report_type.id)
-> Seq Scan on report_skilltype
(cost=13.51..40175.69 rows=809647 width=62) (actual time=0.217..668.918
rows=1619274 loops=1)
Filter: (NOT (hashed SubPlan
1))
Rows Removed by Filter: 33
SubPlan 1
-> Hash Join
(cost=1.05..13.07 rows=176 width=4) (actual time=0.056..0.194 rows=26
loops=1)
Hash Cond:
(u2.product_id = u3.id)
-> Seq Scan on
product_kbdata_skills u2 (cost=0.00..8.28 rows=528 width=8) (actual
time=0.001..0.083 rows=528 loops=1)
-> Hash
(cost=1.04..1.04 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
Buckets: 1024
Batches: 1 Memory Usage: 1kB
-> Seq Scan on
product u3 (cost=0.00..1.04 rows=1 width=4) (actual time=0.004..0.005
rows=1 loops=1)
Filter:
((slug)::text = 'product1'::text)
Rows
Removed by Filter: 2
-> Hash (cost=55200.19..55200.19
rows=830900 width=4) (actual time=709.334..709.334 rows=838440 loops=1)
Buckets: 131072 Batches: 2
Memory Usage: 14741kB
-> Seq Scan on report_type
(cost=0.00..55200.19 rows=830900 width=4) (actual time=0.005..485.451
rows=838440 loops=1)
Filter: ((NOT
is_current) AND (category <> ALL ('{7,8}'::integer[])))
Rows Removed by Filter:
801846
-> Hash (cost=10236.91..10236.91
rows=475491 width=16) (actual time=243.534..243.534 rows=479828 loops=1)
Buckets: 65536 Batches: 1 Memory
Usage: 22492kB
-> Seq Scan on report_submission
(cost=0.00..10236.91 rows=475491 width=16) (actual time=0.003..115.188
rows=479828 loops=1)
-> Hash (cost=57.68..57.68 rows=376 width=12) (actual
time=1.189..1.189 rows=374 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 17kB
-> Hash Join (cost=32.00..57.68 rows=376 width=12)
(actual time=0.627..1.085 rows=374 loops=1)
Hash Cond: (kbdata_skill.id =
product_kbdata_skills.skill_id)
-> Seq Scan on kbdata_skill (cost=0.00..19.79
rows=542 width=4) (actual time=0.007..0.260 rows=542 loops=1)
Filter: (external_uuid <>
'CE9EC8A7-80D8-42A2-B506-0C1B7248CF8C'::uuid)
Rows Removed by Filter: 1
-> Hash (cost=27.16..27.16 rows=387 width=8)
(actual time=0.616..0.616 rows=374 loops=1)
Buckets: 1024 Batches: 1 Memory Usage:
15kB
-> Hash Join (cost=11.71..27.16 rows=387
width=8) (actual time=0.204..0.530 rows=374 loops=1)
Hash Cond:
(product_kbdata_skills.skill_id = kbdata_skill_aspects.skill_id)
-> Seq Scan on
product_kbdata_skills (cost=0.00..8.28 rows=528 width=4) (actual
time=0.002..0.121 rows=528 loops=1)
Filter: (product_id IS NOT
NULL)
-> Hash (cost=6.87..6.87 rows=387
width=4) (actual time=0.195..0.195 rows=387 loops=1)
Buckets: 1024 Batches: 1
Memory Usage: 14kB
-> Seq Scan on
kbdata_skill_aspects (cost=0.00..6.87 rows=387 width=4) (actual
time=0.001..0.085 rows=387 loops=1)
Filter: (aspect_id IS
NOT NULL)
Total runtime: 5013.070 ms
(53 rows)

On Sat, Sep 9, 2017 at 8:26 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Dave Peticolas <dave(at)krondo(dot)com> writes:
> > Hi, I am trying to analyze a performance regression from 9.2.21 to 9.6.3.
> > The query and execution plans are below with 9.6.3 first.
>
> Hm. Neither version is exactly covering itself with glory. I'm not sure
> why 9.6 doesn't pick the same plan as 9.2, but given the planner's
> estimate that the report_submission/report_skilltype join will produce
> only one row, the difference in the estimated costs would be negligible.
> Since that join is actually producing over a million rows, your chances
> of getting a good plan hinge on improving that estimate.
>
> > I realize the query itself is probably not great and would benefit from a
> > different approach, but I'd like to know if there are 9.6 settings I
> should
> > look into to get a better plan without changing the query if possible.
>
> Doubt it. Am I right in guessing that report_submission.id is a
> declared-not-null column, so that the join
>
> FROM "report_skilltype"
> LEFT OUTER JOIN "report_submission"
> ON ("report_skilltype"."uuid" = "report_submission"."skill_type_id")
> WHERE ... "report_submission"."id" IS NULL
>
> should be understood as an anti-join? The planner doesn't get that
> at the moment, for implementation reasons that needn't concern us here.
> But it would get it if you said
>
> WHERE ... "report_submission"."skill_type_id" IS NULL
>
> i.e. constrain the join column to be null. Any chance of whacking your
> ORM upside the head to the point where it would emit that?
>
> regards, tom lane
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2017-09-10 07:39:56 Re: pg_ident mapping Kerberos Usernames
Previous Message Tom Lane 2017-09-10 03:24:43 Re: Analyzing performance regression from 9.2 to 9.6