Re: [Tuning questions..]

From: "Michael T(dot) Halligan" <michael(at)echo(dot)com>
To: Bojan Belovic <bbelovic(at)usa(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject: Re: [Tuning questions..]
Date: 2001-12-19 23:52:47
Message-ID: 3C21284F.5070407@echo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Thanks, that sped things up a bit, from 7.6 sec. to about 5.5 sec. However
the plan still includes a sequential scan on ssa_candidate:

Aggregate (cost=12161.11..12161.11 rows=1 width=35)
-> Merge Join (cost=11611.57..12111.12 rows=19996 width=35)
-> Sort (cost=11488.27..11488.27 rows=99805 width=24)
-> Seq Scan on ssa_candidate sc (cost=0.00..3201.05
rows=99805 width=24)
-> Sort (cost=123.30..123.30 rows=31 width=11)
-> Index Scan using station_subgenre_pk on station_subgenre
ss (cost=0.00..122.53 rows=31 width=11)

If we run the same query on Oracle (modified slightly for syntax):

SELECT count(DISTINCT song_id) AS X
FROM ssa_candidate SC,
station_subgenre SS
WHERE SC.style_id = SS.style_id
AND SS.station_id =

with the same data, it runs almost instaneously. Oracle's plan uses a hash
join:

SELECT STATEMENT Optimizer=CHOOSE (Cost=63 Card=1 Bytes=15)
SORT (GROUP BY)
HASH JOIN (Cost=63 Card=8943 Bytes=134145)
INDEX (RANGE SCAN) OF STATION_SUBGENRE_PK (UNIQUE) (Cost=3 Card=12
Bytes=84)
TABLE ACCESS (FULL) OF SSA_CANDIDATE (Cost=59 Card=60364
Bytes=482912)

Is there some way to convince PostgreSQL to use a hash join?

Bojan Belovic wrote:

>Not sure about tuning, but it seems to me that this query would be
much more
>effective if it's rewritten like this (especially if style_id columns
on both
>tables are indexed):
>
>SELECT count(DISTINCT song_id) AS X
>FROM ssa_candidate SC JOIN station_subgenre SS ON SC.style_id =
SS.style_id
>WHERE SS.station_id = 48
>
>Please correct me if I'm wrong. Also, Michael, if you give this one a try,
>could you send me the query plan, I'm just curious.
>
>SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
>WHERE
>
>SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
>WHERE "Michael T. Halligan" <michael(at)echo(dot)com> wrote:
>
>>Hi.. I seem to be running into a bottle neck on a query, and I'm not
>>sure what the bottleneck is .
>>The machine is a dual-processor p3 750 with 2 gigs of (pc100) memory,
>>and 3 72 gig disks setup
>>in raid 5. Right now i'm just testing our db for speed (we're porting
>>from oracle) .. later on
>>We're looking @ a quad xeon 700 with 16 gigs of ram & 10 drives in
>>hardware raid 5.
>>
>>We've tuned the queries a bit, added some indices, and we got this query
>>down from about 15 minutes
>>to 7.6 seconds.. but it just seems like we should be able to get this
>>query down to under a second on
>>this box.. It's running the latest suse, with 2.4.16 kernel, reiserfs,
>>postgres 7.2b3. I've tried many different combinations
>>of buffers, stat collection space, sort space, etc. none of them really
>>effect performance..
>>
>>When I run this particular query, the only resource that seems to change
>>is one of the processors gets up to
>>about 99% usage.. I've tried setting postgres to use up to 1.6 gigs of
>>memory, but the postmaster never seems
>>to get above about 700megs.. it's not swapping at all, though the
>>contact switching seems to get a bit high (peaking
>>at 150) ..
>>
>>The query sorts through about 80k rows.. here's the query
>>--------------------------------------------------
>>SELECT count(*) FROM (
>> SELECT DISTINCT song_id FROM ssa_candidate WHERE
>>style_id IN (
>> SELECT style_id FROM station_subgenre WHERE
>>station_id = 48
>> )
>> ) AS X;
>>--------------------------------------------------
>>and the query plan :
>>--------------------------------------------------
>>NOTICE: QUERY PLAN:
>>
>>Aggregate (cost=12236300.87..12236300.87 rows=1 width=13)
>> -> Subquery Scan x (cost=12236163.64..12236288.40 rows=4990 width=13)
>> -> Unique (cost=12236163.64..12236288.40 rows=4990
width=13)SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
>>WHERE
>> -> Sort (cost=12236163.64..12236163.64 rows=49902
width=13)
>> -> Seq Scan on ssa_candidate
>>(cost=0.00..12232269.54 rows=49902 width=13)
>> SubPlan
>> -> Materialize (cost=122.53..122.53
>>rows=31 width=11)
>> -> Index Scan using
>>station_subgenre_pk on station_subgenre (cost=0.00..122.53 rows=31
>>width=11)
>>
>>EXPLAIN
>>--------------------------------------------------
>>
>>
>>If anybody has any ideas, I'd be really appreciative..
>>
>>
>>
>>
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>>
>
>
>____________________________________________________________________
>Get free e-mail and a permanent address at http://www.amexmail.com/?A=1
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>message can get through to the mailing list cleanly
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Chris Ruprecht 2001-12-20 00:55:06 Re: poor performance of loading data
Previous Message Tom Lane 2001-12-19 23:04:31 Re: Tuning questions..