Help me understand why my subselect is an order of magnitude faster than my nested joins

From: Eli Naeher <enaeher(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Help me understand why my subselect is an order of magnitude faster than my nested joins
Date: 2014-03-03 18:24:58
Message-ID: CAJVWyAy4q3WsfXSjD0DD7tt2CU__W4q0FxdpcxBhwmURnAzuuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

I have two versions of essentially the same query; one using nested joins,
the other using subselects. The version using the subselect is roughly an
order of magnitude faster (~70ms on my box and data vs ~900ms for the
nested joins). Of course the obvious answer here is just to use the faster
version, but I'd like to understand why the other version is so slow. These
queries are automatically generated by our code and I'd like to feel more
informed when deciding what style of query it should be generating (and to
know whether there is a way to write the nested-join queries that will more
closely approach the performance of the subselect).

(The table aliasing is an artifact of the code that is generating this
query--I assume there is no big performance impact there, but perhaps that
assumption is mistaken.)

The join version:

(SELECT DISTINCT resource_type_1.*
FROM resource_type AS resource_type_1
LEFT JOIN group_authorization AS group_authorization_2
INNER JOIN group_member AS group_member_4
ON ( ( group_authorization_2.person_oid =
group_member_4.person_oid )
AND ( group_authorization_2.group_oid =
group_member_4.group_oid ) )
INNER JOIN wco_group AS group_5
ON ( group_authorization_2.group_oid =
group_5.obj_oid )
ON ( resource_type_1.obj_oid =
group_authorization_2.rtype_oid )
WHERE ( ( ( ( ( group_5.end_date IS NULL )
OR ( group_5.end_date >= '2014-03-03T18:08:23.543001Z' ) )
AND ( ( group_member_4.expire IS NULL )
OR ( group_member_4.expire >=
'2014-03-03T18:08:23.543001Z'
) )
AND ( ( group_authorization_2.expire IS NULL )
OR ( group_authorization_2.expire >=
'2014-03-03T18:08:23.543001Z'
)
)
)
AND ( group_authorization_2.person_oid = 1 ) )
OR ( resource_type_1.authorized = false ) ))

(explain (analyze, buffers) output is at http://explain.depesz.com/s/wPZL)

The subselect version:

(SELECT DISTINCT resource_type_1.*
FROM resource_type AS resource_type_1
WHERE ( ( resource_type_1.authorized = false )
OR ( resource_type_1.obj_oid IN (SELECT rtype_oid
FROM group_authorization
INNER JOIN group_member
ON ( (
group_member.group_oid
=
group_authorization.group_oid )
AND ( group_member.person_oid =
group_authorization.person_oid ) )
INNER JOIN wco_group
ON ( group_member.group_oid = wco_group.obj_oid )
WHERE ( ( group_member.person_oid = 1 )
AND ( ( group_authorization.expire >
'2014-03-03T18:11:20.553844Z' )
OR ( group_authorization.expire IS NULL ) )
AND ( ( group_member.expire > '2014-03-03T18:11:20.553844Z'
)
OR ( group_member.expire IS NULL ) )
AND ( ( wco_group.end_date > '2014-03-03T18:11:20.553844Z'
)
OR ( wco_group.end_date IS NULL ) ) )) ) ))

(explain (analyze, buffers) output is at http://explain.depesz.com/s/70dd)

This is using Postgres 9.3.3. The table wco_group has ~5000 rows,
group_member has ~15000 rows, and group_authorization is the big one with
~385000 rows.

I noticed that the nested join version was doing a lot of seq scans and not
using the indexes. I tried setting enable_seqscan to off to force index
use, and it was a bit slower that way, so the query optimizer is definitely
doing the right thing.

Any thoughts would be much appreciated.

Thank you,
-Eli

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Eli Naeher 2014-03-03 18:55:12 Subselect an order of magnitude faster than nested joins
Previous Message Damon Snyder 2014-03-03 17:55:41 Re: Help with optimizing a query over hierarchical data