Re: JOINs ... how I hate them ...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Marc G(dot) Fournier" <scrappy(at)hub(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JOINs ... how I hate them ...
Date: 2002-02-07 23:32:50
Message-ID: 20120.1013124770@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Marc G. Fournier" <scrappy(at)hub(dot)org> writes:
> explain SELECT p.uid, p.handle
> FROM gender_f pgf JOIN profiles p ON (pgf.uid = p.uid) ;

> Which explains out as:

> Hash Join (cost=1354.80..45297.83 rows=75664 width=27)
> -> Seq Scan on profiles p (cost=0.00..35707.69 rows=485969 width=19)
> -> Hash (cost=1165.64..1165.64 rows=75664 width=8)
> -> Seq Scan on gender_f pgf (cost=0.00..1165.64 rows=75664 width=8)

> Now, profiles has uid as its primary KEY, and there are no
> duplicates in gender_f ... so, as my HashJoin points out, I should have 75664
> results returned ... that is expected ... and the SeqScan on gender_f is
> expected ... but the SeqScan on profiles is what I would hope to get rid
> of ...

Um, why? Looks like a perfectly reasonable plan to me.

> get uid from gender_f, find corresponding entry in profiles ...

I'm not convinced that 75000 indexscan probes would be faster than a
sequential scan across that table. You could probably force the issue
with "set enable_hashjoin to off" (and maybe also "set enable_mergejoin
to off") and then see what the plan is and what the actual timing is.
(EXPLAIN ANALYZE should be real helpful here.)

> explain SELECT p.uid, p.handle
> FROM ( orientation_c poc JOIN gender_f pgf USING ( uid ) ) JOIN profiles p ON (pgf.uid = p.uid) ;

> Hash Join (cost=6023.92..47537.10 rows=75664 width=35)
> -> Seq Scan on profiles p (cost=0.00..35707.69 rows=485969 width=19)
> -> Hash (cost=5834.76..5834.76 rows=75664 width=16)
> -> Merge Join (cost=0.00..5834.76 rows=75664 width=16)
> -> Index Scan using poc_uid on orientation_c poc (cost=0.00..2807.82 rows=126477 width=8)
> -> Index Scan using pgf_uid on gender_f pgf (cost=0.00..1575.79 rows=75664 width=8)

> The MergeJoin between poc/pgf will only return 12000 records, and since it
> is a 1:1 relationship between each of those tables, there will *only* be
> 12000 records pulled from profiles ...

Hmm, it thinks that there will be 75664 not 12000 records out of that
join. Why the discrepancy? Could we see the pg_stats data for these
tables?

> ... but, again, for each of
> *those* 12000 records, its doing a SeqScan on last_login's 485k records,
> instead of using the index

No, certainly *not* "for each record". It's a hash join, so it only
reads each table once.

> Now, if I 'set enable_seqscan=false;' and do the exact same explain, it
> definitely comes more in line with what I'd like to see, as far as index
> usage is concerned:

And what's the actual runtime come out to be?

We definitely should standardize on asking for EXPLAIN ANALYZE results
in bad-plan discussions, now that we have that capability.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message D. Hageman 2002-02-07 23:36:52 Re: Threaded PosgreSQL server
Previous Message D. Hageman 2002-02-07 23:31:03 Re: Threaded PosgreSQL server