Re: JOINs ... how I hate them ...

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: "Marc G(dot) Fournier" <scrappy(at)hub(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JOINs ... how I hate them ...
Date: 2002-02-07 19:07:52
Message-ID: 1013108872.6992.29.camel@taru.tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2002-02-07 at 16:23, Marc G. Fournier wrote:
>
>
>
> The MergeJoin between poc/pgf will only return 12000 records, and since it
> is a 1:1 relationship between each of those tables, there will *only* be
> 12000 records pulled from profiles ... yet its doing a SeqScan through all
> 485k records for each of those UIDs?
>
> This is after I've performed a VACUUM ANALYZE ...
>
> The final query itself is:
>
> SELECT p.uid, p.profiles_handle
> FROM ( ( profiles_orientation_c poc JOIN profiles_gender_f pgf USING ( uid ) )> JOIN iwantu_profiles p USING (uid ) ) LEFT JOIN iwantu_last_login ll USING ( uid );
>
> Which explains as:
>
> Hash Join (cost=31636.40..78239.34 rows=75664 width=43)
> -> Hash Join (cost=6023.92..47537.10 rows=75664 width=35)
> -> Seq Scan on iwantu_profiles p (cost=0.00..35707.69 rows=485969 width=19)
> -> Hash (cost=5834.76..5834.76 rows=75664 width=16)
> -> Merge Join (cost=0.00..5834.76 rows=75664 width=16)
> -> Index Scan using poc_uid on profiles_orientation_c poc (cost=0.00..2807.82 rows=126477 width=8)
> -> Index Scan using pgf_uid on profiles_gender_f pgf (cost=0.00..1575.79 rows=75664 width=8)
> -> Hash (cost=7955.64..7955.64 rows=485964 width=8)
> -> Seq Scan on iwantu_last_login ll (cost=0.00..7955.64 rows=485964 width=8)
>
> EXPLAIN
>
> So, poc&pgf are MergeJoin's, leaving me with 12000 records again ... then
> there is the SeqScan/HashJoin wiht profiles, which will leave me with
> 12000 records, but with more information ... but, again, for each of
> *those* 12000 records, its doing a SeqScan on last_login's 485k records,
> instead of using the index ... again, like pgf and poc, there is only one
> record for every uid, so we aren't dealing with duplicates ...

I recently sped up a somewhat similar query from 15 sec to < 1 sec by
rewriting it to use a subselect:

SELECT p.uid, p.profiles_handle
FROM profiles_orientation_c poc,
profiles_gender_f pgf
(select uid, profiles_handle
from iwantu_profiles ip
where ip.uid = pgf.uid
) p
WHERE poc.uid = pgf.uid

If you need something from iwantu_last_login it should go into that
subselect as well

That tricked my case to do the small join first.

-----------------
Hannu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Terrell 2002-02-07 19:15:26 Re: DRDA, network protocol, and documentation
Previous Message Tom Lane 2002-02-07 18:58:29 Re: Summary of new configuration file and data directory locations