Re: Huge difference in query performance between 8.3 and 8.4 (possibly)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robin Houston <robin(dot)houston(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Huge difference in query performance between 8.3 and 8.4 (possibly)
Date: 2009-07-09 17:09:11
Message-ID: 2712.1247159351@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Robin Houston <robin(dot)houston(at)gmail(dot)com> writes:
> We have a query that runs very slowly on our 8.3 database. (I can't
> tell you exactly how slowly, because it has never successfully run to
> completion even when we left it running overnight.) On the 8.4
> database on my laptop, it runs in about 90 seconds. Of course there
> are several differences between the two instances, but I wonder
> whether query planning improvements in 8.4 could essentially account
> for it.

Well, it's hard to be sure with only EXPLAIN and not EXPLAIN ANALYZE
output to look at; but I think the significant difference in these plans
is that 8.4 has chosen a hash instead of nestloop join for a couple of
the intermediate join levels. Which is evidently because of a change
in the estimated size of the next join down:

-> Nested Loop (cost=44050.86..77114.32 rows=1 width=50)
Join Filter: ((second.feature_cvterm_id > first.feature_cvterm_id) AND (second.feature_id = first.feature_id) AND (second.pub_id = first.pub_id) AND ((second_withfrom_dbxref.accession)::text = (first_withfrom_dbxref.accession)::text))
-> Nested Loop (cost=30794.26..42915.70 rows=1 width=69)
-> Hash Join (cost=30794.26..42906.88 rows=1 width=65)
Hash Cond: (second_evidence.type_id = evidence_type.cvterm_id)

versus

-> Hash Join (cost=63949.73..77732.49 rows=1 width=59)
Hash Cond: ((second.feature_id = first.feature_id) AND (second.pub_id = first.pub_id) AND ((second_withfrom_dbxref.accession)::text = (first_withfrom_dbxref.accession)::text))
Join Filter: (second.feature_cvterm_id > first.feature_cvterm_id)
-> Hash Join (cost=30236.57..41303.13 rows=4607 width=66)
Hash Cond: (second_evidence.type_id = evidence_type.cvterm_id)

If the 8.4 rowcount estimate is accurate then it's not surprising that
the nestloop plan sucks --- it'd be re-executing the other arm of the
join 4600 or so times.

This could reflect improvements in the join size estimation code, or
maybe it's just a consequence of 8.4 using larger statistics targets
by default. It's hard to be sure with so little information to go on.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Hartman, Matthew 2009-07-09 17:36:00 Re: Huge difference in query performance between 8.3 and 8.4 (possibly)
Previous Message Alvaro Herrera 2009-07-09 17:02:31 Re: Huge difference in query performance between 8.3 and 8.4 (possibly)