From: | "David Johnston" <polobo(at)yahoo(dot)com> |
---|---|
To: | "'Jean-Christophe Boggio'" <postgresql(at)thefreecat(dot)org>, <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Difference between ON and WHERE in JOINs |
Date: | 2012-09-19 17:17:23 |
Message-ID: | 01ea01cd968a$a2a83bc0$e7f8b340$@yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> I have this query working :
> select profil,count(og.name)
> from ldap l
> left join uidinoldgroups ug on l.uid=ug.uid left join oldgroups og on
> og.id=ug.idgroupe and og.rne='0410030k' and og.type='g'
> where l.profilgeneral='P'
> and l.rne='0410030k'
> group by l.profil
>
> But if I put :
> and og.rne='0410030k' and og.type='g'
> in the where part (what you suggested and what I did naturally), I get 0
> results which is quite confusing (because it's an outer join).
>
> Also, if I replace these with full outer joins, I still get 0 results.
>
> With my data, if I replace og.type='g' with og.type='m' I get MOST OF my
> resultats back (those where the count() returns more than zero). I know
this
> is specific to my data but I really don't get the behaviour.
>
> Thanks for your help,
>
> JC
>
The query you wrote is equivalent to this:
SELECT profil, count(og.name)
FROM ldap l
LEFT JOIN ... ug ON (l.uid = ug.uid)
LEFT JOIN (SELECT * FROM ... WHERE rne = '...' AND type = '...') og ON
(og.id = ug.idgroupe)
WHERE l.rne = '' AND l.type=''
As soon as you start putting the right-hand tables in the outer-most where
clause you have to deal with the fact that the outer join can cause those
columns to be null. If you compare those columns to a literal value then
you are saying that you don't want that column to be NULL and by extension
you really meant to use an inner join.
If you had put:
WHERE l.rne = '' AND l.type = '' AND (og.type = 'g' OR og.type IS NULL)
That would give you the behavior such that if og.type is a known value AND
it is not 'g' then the item should NOT be counted. If the value is unknown
or 'g' then count it.
As for a rule-of-thumb I would avoid non-relational conditions in the ON
clause all-together and move them into a sub-query as I showed above. The
WHERE clause is reserved for the left side of the join hierarchy by default.
The ON clause is reserved for inter-relation comparisons. When cobbling
together multiple JOINs I would also suggest making use of liberal newlines
as well as parentheses. Lastly knowing the correct answer is not that
common; just starting building up the query piece-by-piece and keep in mind
that you have to deal with the NULLs introduced by the OUTER JOIN. How you
deal with them is query specific and cannot be "rule-of-thumbed".
Yes, this is all more verbose but you've now separated the relational filter
and the joining into the designated areas making interpreting the query
easier. Throw in usage of CTE/WITH and the important portion of the query
can be made succinct by moving the sub-queries and filters to the top and
out of the way. You can refactor, if necessary, when you are done.
David J.
From | Date | Subject | |
---|---|---|---|
Next Message | Mike Roest | 2012-09-19 19:27:01 | Re: initial sync of multiple streaming slaves simultaneously |
Previous Message | Ivan Voras | 2012-09-19 17:01:16 | pg_dump, send/recv |