Re: Re: multi-table join, final table is outer join count ...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: The Hermit Hacker <scrappy(at)hub(dot)org>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Re: multi-table join, final table is outer join count ...
Date: 2001-05-13 00:41:37
Message-ID: 21433.989714497@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

The Hermit Hacker <scrappy(at)hub(dot)org> writes:
> SELECT distinct s.gid, s.created, count(i.title) AS images
> FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active),
> personal_data pd, relationship_wanted rw
> WHERE s.active AND s.status != 0
> AND (s.gid = pd.gid AND pd.gender = 0)
> AND (s.gid = rw.gid AND rw.gender = 0 )
> GROUP BY s.gid,s.created
> ORDER BY images desc;

> The part that had confused me was the whole 'ON' part ... once I clued in
> that that is essentially a WHERE, it actually made sense ...

Right, but there's some fine points here.

When you're dealing with INNER JOINs, ON (or its variant USING) is
exactly equivalent to WHERE. Write whichever you like.

When you're dealing with OUTER JOINs, ON is *not* quite the same as
WHERE, because it determines which rows are considered to "match"
and thus which rows will be extended with NULLs. Let's take a
simplified version of your above example. If you wrote

FROM status s LEFT JOIN images i ON (s.gid = i.gid)
WHERE i.active AND ...other conditions...

then this would produce the regular inner join of status and images
where gid matches, *plus* a row for each unmatched status row (extended
with NULLs for the images columns). This collection of rows would then
pass through your WHERE clauses, and whichever ones pass all the WHERE
conditions get into the result. But, when you write

FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active)
WHERE ...other conditions...

then you get the inner join of status and images on gid, minus the rows
where i.active is false, plus a null-extended row for each status row
that does not have a matching *active* image row. So the set of rows
that comes out of the join is different: there could be more
null-extended rows in this case than in the other one. In particular,
you could see rows having i.active=NULL in the final result, which'd
never happen if you had put i.active into the WHERE clause instead of
the ON clause.

Bottom line: what you put in the ON part should just be the clauses that
determine whether you think there's a match between the two tables.
The WHERE part is additional restrictions that limit what you want to
see, but don't affect the semantics of whether there's a match.

In your above example, I'm not sure whether it's right to put i.active
in the ON part or in WHERE. It depends on what you want to happen for
status rows that match only inactive images, and whether you consider
them different from status rows that match no images at all.

regards, tom lane

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2001-05-13 01:53:56 Re: can't get rid of unnesesary SORT step in explain plan for hash join
Previous Message Stephan Szabo 2001-05-13 00:38:03 Re: Constraints...