Re: SQL statement over 500% slower with 9.2 compared with 9.1

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: SQL statement over 500% slower with 9.2 compared with 9.1
Date: 2013-08-27 21:27:25
Message-ID: 521D19BD.70208@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 27.8.2013 11:19, Rafael Martinez wrote:
> On 08/26/2013 02:33 PM, Rafael Martinez wrote:
> [............]
>> The SQL statement is:
>
>> SELECT firstname || ' ' || lastname AS Name FROM Person R WHERE
>> R.gender like 'F' AND 19 < (SELECT COUNT(DISTINCT filmId) FROM
>> FilmParticipation F WHERE F.partType = 'director' AND F.personId =
>> R.personId ) AND NOT EXISTS ( SELECT * FROM
>> FilmParticipation D WHERE D.partType = 'director' AND D.personId
>> = R.personId AND NOT EXISTS ( SELECT * FROM FilmParticipation
>> C WHERE C.partType = 'cast' AND C.filmId = D.filmId AND
>> C.personId = D.personId ) ) ;
>
>
> [.............]
>
>> We can see that the query plan is very different between versions
>> and that 9.2 is really wrong with the number of rows involved. Why
>> is 9.2 taking so wrong about the number of rows involved in some
>> parts of the plan?
>
>
> Hei
>
> More information:
>
> If we turn off enable_indexscan the runtime gets more similar to the
> one we get with 9.1, we are down to 4200ms.
>
> The query plan with this configuration is here:
> http://explain.depesz.com/s/jVR
>
> The question remains the same, why is 9.2 using such a different and
> bad plan compared to 9.1, when the data and the configuration are the
> same?

Hi,

seems the problem is mostly about the inner-most query, i.e. this:

SELECT *
FROM FilmParticipation C
WHERE C.partType = 'cast'
AND C.filmId = D.filmId
AND C.personId = D.personId
)

In 9.2 it's estimated to return 1 row, but it returns 595612 of them (or
97780 after materialization). I believe this is the culprit that causes
cost estimates that are way off, and that in turn leads to choice of
"cheaper" plan that actually takes much longer to evaluate.

Because the slow plan is estimated to "cost" 122367017.97 while the fast
one 335084834.95 (i.e. 3x more).

I don't immediately see where's the problem - maybe some other hacker on
this list can. Can you prepare a testcase for this? I.e. a structure of
the tables + data so that we can reproduce it?

regards
Tomas

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2013-08-27 21:37:38 Re: Poor performance on simple queries compared to sql server express
Previous Message Merlin Moncure 2013-08-27 17:17:55 Re: Cpu usage 100% on slave. s_lock problem.