From: | "Marc G(dot) Fournier" <scrappy(at)hub(dot)org> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | JOIN between three *simple* tables ... |
Date: | 2002-02-06 20:47:45 |
Message-ID: | 20020206162412.F57607-100000@earth.hub.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-sql |
Morning all ...
First off, this is using v7.2 release ...
Okay, this is going to drive me up the proverbial wall ... very
simple query:
SELECT p.uid, p.handle
FROM orientation_c poc, profiles p, gender_f pgf
WHERE (p.uid = pgf.uid )
AND (pgf.uid = poc.uid ) ;
profiles contains:
iwantu=# select count(1) from profiles;
count
--------
485969
(1 row)
and is everyone in the system ... no problems there ...
gender_f contains:
iwantu=# select count(1) from gender_f;
count
-------
75664
(1 row)
And is *just* the uid's of those in profiles that are female ...
finally, orientation_c:
iwantu=# select count(1) from orientation_c;
count
--------
126477
(1 row)
Is again *just* the uid's of those in profiles that have a 'c'
orientiation ...
Now, the above wquery has an explain of:
Hash Join (cost=6363.90..47877.08 rows=19692 width=35)
-> Seq Scan on profiles p (cost=0.00..35707.69 rows=485969 width=19)
-> Hash (cost=6174.74..6174.74 rows=75664 width=16)
-> Hash Join (cost=2928.34..6174.74 rows=75664 width=16)
-> Seq Scan on gender_f pgf (cost=0.00..1165.64 rows=75664 width=8)
-> Hash (cost=1948.77..1948.77 rows=126477 width=8)
-> Seq Scan on orientation_c poc (cost=0.00..1948.77 rows=126477 width=8)
Now, a join between poc and pgf alone comes out to:
iwantu=# select count(1) from orientation_c poc, gender_f pgf where poc.uid = pgf.uid;
count
-------
12703
(1 row)
iwantu=# explain select count(1) from orientation_c poc, gender_f pgf where poc.uid = pgf.uid;
NOTICE: QUERY PLAN:
Aggregate (cost=6363.90..6363.90 rows=1 width=16)
-> Hash Join (cost=2928.34..6174.74 rows=75664 width=16)
-> Seq Scan on gender_f pgf (cost=0.00..1165.64 rows=75664 width=8)
-> Hash (cost=1948.77..1948.77 rows=126477 width=8)
-> Seq Scan on orientation_c poc (cost=0.00..1948.77 rows=126477 width=8)
EXPLAIN
Now, what I'd like to have happen is a SEQ SCAN through the smaller table
(gender_f), and grab everything in orientation_c that matches (both tables
have zero duplicates of uid, its purely a one of, so I would think that I
should be able to take 1 uid from pgf, and use the index on poc to
determine if it exists, and do that 75664 times ...
That would live me with 12703 UIDs to match up with apropriate records in
the almost 500+k records in profiles itself, instead of having to scan
through each of thoose 500+k records themselves ...
Then again, let's go one simpler:
iwantu=# \d orientation_c
Table "orientation_c"
Column | Type | Modifiers
--------+--------+-----------
uid | bigint |
Indexes: poc_uid
iwantu=# \d poc_uid
Index "poc_uid"
Column | Type
--------+--------
uid | bigint
btree
iwantu=# explain select count(1) from orientation_c poc where uid = 1;
NOTICE: QUERY PLAN:
Aggregate (cost=2264.97..2264.97 rows=1 width=0)
-> Seq Scan on orientation_c poc (cost=0.00..2264.96 rows=1 width=0)
EXPLAIN
if all varlues in orientation_c are unique, and there are 127k
records ... shouldn't it use the index instead of scanning through all 127k records ? Or am I missing something totally obvious here?
From | Date | Subject | |
---|---|---|---|
Next Message | Haroldo Stenger | 2002-02-06 20:59:32 | Re: Threaded PosgreSQL server |
Previous Message | Doug McNaught | 2002-02-06 20:24:04 | Re: Threaded PosgreSQL server |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-02-06 21:14:33 | Re: JOIN between three *simple* tables ... |
Previous Message | Jeff Eckermann | 2002-02-06 19:24:11 | Re: Query with Parameters |