Re: Joins on many-to-many relations.

From: Wiebe Cazemier <halfgaar(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Joins on many-to-many relations.
Date: 2007-03-14 21:59:05
Message-ID: et9r79$kj5$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Wednesday 14 March 2007 18:58, Frank Bax wrote:
> A performance question should always include the output of EXPLAIN ANALYZE.
>
> I think the problem is database design. If you added a boolean column into
> accounts table which would indicate owner/co-owner; then all data from
> account_co_owner could be merged into accounts and the query would be much
> simpler to code.
>
> I don't expect this code to be any quicker; but I think it more clearly
> identifies the problem with your design:
>
> SELECT accounts.* from accounts
> inner join
> ( SELECT account.* FROM
> ( select id,owner_id from accounts
> union
> select account_id,co_owner_id from account_co_owners
> ) as account
> INNER JOIN
> ( SELECT id FROM people WHERE name LIKE '%user%' ) AS owner
> on account.owner_id = owner.id
> ) as acct on acct.id=accounts.id;

I can't say I really understand that query, but a union is not going to
work, because account_co_owners is nothing more than a join-table,
whereas accounts contains all the information belonging to an account.
An account has one primary owner, indicated by the owner_id, and one or
more co-owners, described by the account_co_owners table. Owners and
co-owners are all of type people. I don't see anything wrong with this
design.

In the real word, an account is actually a transaction_account. This
is the real query ('%KOE%' is the user supplied search string):

SELECT DISTINCT ON (account.id) account.*
FROM trade.transaction_accounts AS account
INNER JOIN people.people AS owner
ON owner.id = account.owner_id
OR owner.id IN (SELECT co_owner_id
FROM trade.transaction_account_co_owners
WHERE account_id = account.id
AND co_owner_id = owner.id)
WHERE upper(account.description) LIKE '%KOE%'
OR upper(owner.name) LIKE '%KOE%'
OR upper(owner.familiar_name) LIKE '%KOE%'
OR upper(owner.full_name) LIKE '%KOE%'

I discovered that removing the subselect (the entire second condition of
the join actually) is not the only thing that speeds it up. If I remove
the LIKE check on account.description, it's also a lot faster (152 ms
as opposed to 2915 ms), although not as fast as without the subselect.
I don't understand why that makes such a big difference. There is an
index on upper() on the field.

This is the EXPLAIN ANALYZE output:

Unique (cost=0.00..1061826.94 rows=800 width=551) (actual time=430.172..6492.619 rows=4 loops=1)
-> Nested Loop (cost=0.00..1061644.80 rows=72856 width=551) (actual time=430.165..6492.585 rows=5 loops=1)
Join Filter: (((upper(("outer".description)::text) ~~ '%KOE%'::text) OR (upper(("inner".name)::text) ~~ '%KOE%'::text) OR (upper(("inner".familiar_name)::text) ~~ '%KOE%'::text) OR (upper(("inner".full_name)::text) ~~ '%KOE%'::text)) AND (("inner".id = "outer".owner_id) OR (subplan)))
-> Index Scan using transaction_accounts_pkey on transaction_accounts account (cost=0.00..36.80 rows=800 width=551) (actual time=0.014..3.717 rows=800 loops=1)
-> Seq Scan on people "owner" (cost=0.00..54.08 rows=1208 width=1552) (actual time=0.002..2.541 rows=1208 loops=800)
SubPlan
-> Seq Scan on transaction_account_co_owners (cost=0.00..2.04 rows=1 width=4) (actual time=0.029..0.029 rows=0 loops=4796)
Filter: ((account_id = $0) AND (co_owner_id = $1))
Total runtime: 6492.709 ms

But, I can't really be asking you to fully analyze my query, unless you see
something obvious that can be improved. My question was mainly general;
if there is a better way than using subselects to join two tables which
are only connected to eachother through a join-table (containing only
references to the two tables in question). Subselects are usually very
slow, aren't they?

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Stefan Ionita 2007-03-15 06:50:14 unsubscribe
Previous Message Nigel Henry 2007-03-14 21:46:54