From: | "Michael T(dot) Halligan" <michael(at)echo(dot)com> |
---|---|
To: | Bojan Belovic <bbelovic(at)usa(dot)net>, pgsql-admin(at)postgresql(dot)org |
Subject: | Re: [Tuning questions..] |
Date: | 2001-12-19 23:52:47 |
Message-ID: | 3C21284F.5070407@echo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Thanks, that sped things up a bit, from 7.6 sec. to about 5.5 sec. However
the plan still includes a sequential scan on ssa_candidate:
Aggregate (cost=12161.11..12161.11 rows=1 width=35)
-> Merge Join (cost=11611.57..12111.12 rows=19996 width=35)
-> Sort (cost=11488.27..11488.27 rows=99805 width=24)
-> Seq Scan on ssa_candidate sc (cost=0.00..3201.05
rows=99805 width=24)
-> Sort (cost=123.30..123.30 rows=31 width=11)
-> Index Scan using station_subgenre_pk on station_subgenre
ss (cost=0.00..122.53 rows=31 width=11)
If we run the same query on Oracle (modified slightly for syntax):
SELECT count(DISTINCT song_id) AS X
FROM ssa_candidate SC,
station_subgenre SS
WHERE SC.style_id = SS.style_id
AND SS.station_id =
with the same data, it runs almost instaneously. Oracle's plan uses a hash
join:
SELECT STATEMENT Optimizer=CHOOSE (Cost=63 Card=1 Bytes=15)
SORT (GROUP BY)
HASH JOIN (Cost=63 Card=8943 Bytes=134145)
INDEX (RANGE SCAN) OF STATION_SUBGENRE_PK (UNIQUE) (Cost=3 Card=12
Bytes=84)
TABLE ACCESS (FULL) OF SSA_CANDIDATE (Cost=59 Card=60364
Bytes=482912)
Is there some way to convince PostgreSQL to use a hash join?
Bojan Belovic wrote:
>Not sure about tuning, but it seems to me that this query would be
much more
>effective if it's rewritten like this (especially if style_id columns
on both
>tables are indexed):
>
>SELECT count(DISTINCT song_id) AS X
>FROM ssa_candidate SC JOIN station_subgenre SS ON SC.style_id =
SS.style_id
>WHERE SS.station_id = 48
>
>Please correct me if I'm wrong. Also, Michael, if you give this one a try,
>could you send me the query plan, I'm just curious.
>
>SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
>WHERE
>
>SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
>WHERE "Michael T. Halligan" <michael(at)echo(dot)com> wrote:
>
>>Hi.. I seem to be running into a bottle neck on a query, and I'm not
>>sure what the bottleneck is .
>>The machine is a dual-processor p3 750 with 2 gigs of (pc100) memory,
>>and 3 72 gig disks setup
>>in raid 5. Right now i'm just testing our db for speed (we're porting
>>from oracle) .. later on
>>We're looking @ a quad xeon 700 with 16 gigs of ram & 10 drives in
>>hardware raid 5.
>>
>>We've tuned the queries a bit, added some indices, and we got this query
>>down from about 15 minutes
>>to 7.6 seconds.. but it just seems like we should be able to get this
>>query down to under a second on
>>this box.. It's running the latest suse, with 2.4.16 kernel, reiserfs,
>>postgres 7.2b3. I've tried many different combinations
>>of buffers, stat collection space, sort space, etc. none of them really
>>effect performance..
>>
>>When I run this particular query, the only resource that seems to change
>>is one of the processors gets up to
>>about 99% usage.. I've tried setting postgres to use up to 1.6 gigs of
>>memory, but the postmaster never seems
>>to get above about 700megs.. it's not swapping at all, though the
>>contact switching seems to get a bit high (peaking
>>at 150) ..
>>
>>The query sorts through about 80k rows.. here's the query
>>--------------------------------------------------
>>SELECT count(*) FROM (
>> SELECT DISTINCT song_id FROM ssa_candidate WHERE
>>style_id IN (
>> SELECT style_id FROM station_subgenre WHERE
>>station_id = 48
>> )
>> ) AS X;
>>--------------------------------------------------
>>and the query plan :
>>--------------------------------------------------
>>NOTICE: QUERY PLAN:
>>
>>Aggregate (cost=12236300.87..12236300.87 rows=1 width=13)
>> -> Subquery Scan x (cost=12236163.64..12236288.40 rows=4990 width=13)
>> -> Unique (cost=12236163.64..12236288.40 rows=4990
width=13)SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
>>WHERE
>> -> Sort (cost=12236163.64..12236163.64 rows=49902
width=13)
>> -> Seq Scan on ssa_candidate
>>(cost=0.00..12232269.54 rows=49902 width=13)
>> SubPlan
>> -> Materialize (cost=122.53..122.53
>>rows=31 width=11)
>> -> Index Scan using
>>station_subgenre_pk on station_subgenre (cost=0.00..122.53 rows=31
>>width=11)
>>
>>EXPLAIN
>>--------------------------------------------------
>>
>>
>>If anybody has any ideas, I'd be really appreciative..
>>
>>
>>
>>
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>>
>
>
>____________________________________________________________________
>Get free e-mail and a permanent address at http://www.amexmail.com/?A=1
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>message can get through to the mailing list cleanly
>
From | Date | Subject | |
---|---|---|---|
Next Message | Chris Ruprecht | 2001-12-20 00:55:06 | Re: poor performance of loading data |
Previous Message | Tom Lane | 2001-12-19 23:04:31 | Re: Tuning questions.. |