Quick Links

Re: JOINs ... how I hate them ...

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Marc G(dot) Fournier" <scrappy(at)hub(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: JOINs ... how I hate them ...
Date:	2002-02-07 23:32:50
Message-ID:	20120.1013124770@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

"Marc G. Fournier" <scrappy(at)hub(dot)org> writes:
> explain SELECT p.uid, p.handle
> FROM gender_f pgf JOIN profiles p ON (pgf.uid = p.uid) ;

> Which explains out as:

> Hash Join (cost=1354.80..45297.83 rows=75664 width=27)
> -> Seq Scan on profiles p (cost=0.00..35707.69 rows=485969 width=19)
> -> Hash (cost=1165.64..1165.64 rows=75664 width=8)
> -> Seq Scan on gender_f pgf (cost=0.00..1165.64 rows=75664 width=8)

> Now, profiles has uid as its primary KEY, and there are no
> duplicates in gender_f ... so, as my HashJoin points out, I should have 75664
> results returned ... that is expected ... and the SeqScan on gender_f is
> expected ... but the SeqScan on profiles is what I would hope to get rid
> of ...

Um, why? Looks like a perfectly reasonable plan to me.

> get uid from gender_f, find corresponding entry in profiles ...

I'm not convinced that 75000 indexscan probes would be faster than a
sequential scan across that table. You could probably force the issue
with "set enable_hashjoin to off" (and maybe also "set enable_mergejoin
to off") and then see what the plan is and what the actual timing is.
(EXPLAIN ANALYZE should be real helpful here.)

> explain SELECT p.uid, p.handle
> FROM ( orientation_c poc JOIN gender_f pgf USING ( uid ) ) JOIN profiles p ON (pgf.uid = p.uid) ;

> Hash Join (cost=6023.92..47537.10 rows=75664 width=35)
> -> Seq Scan on profiles p (cost=0.00..35707.69 rows=485969 width=19)
> -> Hash (cost=5834.76..5834.76 rows=75664 width=16)
> -> Merge Join (cost=0.00..5834.76 rows=75664 width=16)
> -> Index Scan using poc_uid on orientation_c poc (cost=0.00..2807.82 rows=126477 width=8)
> -> Index Scan using pgf_uid on gender_f pgf (cost=0.00..1575.79 rows=75664 width=8)

> The MergeJoin between poc/pgf will only return 12000 records, and since it
> is a 1:1 relationship between each of those tables, there will *only* be
> 12000 records pulled from profiles ...

Hmm, it thinks that there will be 75664 not 12000 records out of that
join. Why the discrepancy? Could we see the pg_stats data for these
tables?

> ... but, again, for each of
> *those* 12000 records, its doing a SeqScan on last_login's 485k records,
> instead of using the index

No, certainly *not* "for each record". It's a hash join, so it only
reads each table once.

> Now, if I 'set enable_seqscan=false;' and do the exact same explain, it
> definitely comes more in line with what I'd like to see, as far as index
> usage is concerned:

And what's the actual runtime come out to be?

We definitely should standardize on asking for EXPLAIN ANALYZE results
in bad-plan discussions, now that we have that capability.

regards, tom lane

In response to

JOINs ... how I hate them ... at 2002-02-07 14:23:22 from Marc G. Fournier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	D. Hageman	2002-02-07 23:36:52	Re: Threaded PosgreSQL server
Previous Message	D. Hageman	2002-02-07 23:31:03	Re: Threaded PosgreSQL server