Re: Query Join Performance

From: "Aaron Bono" <postgresql(at)aranya(dot)com>
To: "Richard Huxton" <dev(at)archonet(dot)com>
Cc: "PostgreSQL SQL List" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Query Join Performance
Date: 2007-04-25 20:39:55
Message-ID: bf05e51c0704251339u2883211eh4cdedcbec63c2be9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On 4/25/07, Richard Huxton <dev(at)archonet(dot)com> wrote:
>
> Aaron Bono wrote:
> > Performance tanks with this query - it takes over 120 seconds (that is
> > where
> > I set the timeout).
>
> > BTW, on our Linux box the full query we run (which adds 3 more tables on
> > the
> > whole operation along with more filtering on the zip table) finishes in
> > under 10 seconds. Problem is our development is on Windows and this is
> a
> > real pain for developers to test.
>
> So what's different between the systems. Obvious things to look at:
> 1. EXPLAIN ANALYSE outputs to show the plans (these presumably are
> different, but in what details, and why?)
> 2. Configuration (particularly memory/cost settings).
> 3. Hardware.
> 4. Locale/encoding - these can affect index usage and sorting.
>
> --
> Richard Huxton
> Archonet Ltd
>

Now I am having the same problem on the Linux box so I doubt it is the
platform.

The biggest problem I notice is when I add a join from a child table
(zip_city) to a parent table (zip). I have filtered the child table down to
about 650 records but when I add the join to the parent which has over
800,000 records, performance tanks. I was able to benchmark two queries
last night on my Windows machine:

-- This runs in just over 2 seconds

select
nearby_zip_city.zip_id,
gps_distance(zip_city.longitude::numeric,
zip_city.latitude::numeric, nearby_zip_city.longitude::numeric,
nearby_zip_city.latitude::numeric) AS distance
from zip_city
inner join zip on (
zip.zip_id = zip_city.zip_id
)
inner join zip_city as nearby_zip_city on (
abs(zip_city.longitude - nearby_zip_city.longitude) <= (50.0 / 60.0)
AND abs(zip_city.latitude - nearby_zip_city.latitude) <= (50.0 /
60.0)
)
where zip.zip_cd = '66105'

-- This takes over 48 seconds and I just added a join from the zip_city
child to the zip parent table

select
nearby_zip.zip_cd,
gps_distance(zip_city.longitude::numeric,
zip_city.latitude::numeric, nearby_zip_city.longitude::numeric,
nearby_zip_city.latitude::numeric) AS distance
from zip_city
inner join zip on (
zip.zip_id = zip_city.zip_id
)
inner join zip_city as nearby_zip_city on (
abs(zip_city.longitude - nearby_zip_city.longitude) <= (50.0 / 60.0)
AND abs(zip_city.latitude - nearby_zip_city.latitude) <= (50.0 /
60.0)
)
-->>> The next 3 lines are the main difference <<<--
inner join zip as nearby_zip on (
nearby_zip_city.zip_id = nearby_zip.zip_id
)
-->>> End of difference <<<--
where zip.zip_cd = '66105'

-- Explain plan for faster/first query:
Nested Loop (cost=45779.82..147990502.45 rows=451678770 width=40) (actual
time=5404.943..20151.684 rows=653 loops=1)
Join Filter: ((abs(("inner".longitude - "outer".longitude)) <=
0.833333333333333::double precision) AND (abs(("inner".latitude -
"outer".latitude)) <= 0.833333333333333::double precision))
-> Seq Scan on zip_city nearby_zip_city (cost=0.00..25514.19 rows=901719
width=24) (actual time=0.053..2311.547 rows=901719 loops=1)
-> Materialize (cost=45779.82..45824.90 rows=4508 width=16) (actual
time=0.003..0.009 rows=1 loops=901719)
-> Hash Join (cost=8944.55..45775.31 rows=4508 width=16) (actual
time=444.657..4490.901 rows=1 loops=1)
Hash Cond: ("outer".zip_id = "inner".zip_id)
-> Seq Scan on zip_city (cost=0.00..25514.19 rows=901719
width=24) (actual time=0.004..2334.548 rows=901719 loops=1)
-> Hash (cost=8933.90..8933.90 rows=4261 width=8) (actual
time=0.296..0.296 rows=1 loops=1)
-> Bitmap Heap Scan on zip
(cost=32.91..8933.90rows=4261 width=8) (actual time=
0.272..0.275 rows=1 loops=1)
Recheck Cond: ((zip_cd)::text = '66105'::text)
-> Bitmap Index Scan on zip_zip_cd_key (cost=
0.00..32.91 rows=4261 width=0) (actual time=0.250..0.250 rows=1 loops=1)
Index Cond: ((zip_cd)::text = '66105'::text)

-- Explain plan for shower/second query:
Nested Loop (cost=75372.31..148056286.32 rows=451678770 width=43) (actual
time=62688.188..69916.943 rows=653 loops=1)
Join Filter: ((abs(("inner".longitude - "outer".longitude)) <=
0.833333333333333::double precision) AND (abs(("inner".latitude -
"outer".latitude)) <= 0.833333333333333::double precision))
-> Hash Join (cost=29592.49..91298.06 rows=901719 width=27) (actual
time=17905.224..52279.151 rows=901719 loops=1)
Hash Cond: ("outer".zip_id = "inner".zip_id)
-> Seq Scan on zip_city nearby_zip_city
(cost=0.00..25514.19rows=901719 width=24) (actual time=
0.044..2888.993 rows=901719 loops=1)
-> Hash (cost=21634.79..21634.79 rows=852279 width=19) (actual
time=13925.502..13925.502 rows=852279 loops=1)
-> Seq Scan on zip nearby_zip
(cost=0.00..21634.79rows=852279 width=19) (actual time=
0.042..2535.742 rows=852279 loops=1)
-> Materialize (cost=45779.82..45824.90 rows=4508 width=16) (actual
time=0.002..0.009 rows=1 loops=901719)
-> Hash Join (cost=8944.55..45775.31 rows=4508 width=16) (actual
time=421.374..4453.224 rows=1 loops=1)
Hash Cond: ("outer".zip_id = "inner".zip_id)
-> Seq Scan on zip_city (cost=0.00..25514.19 rows=901719
width=24) (actual time=0.028..2333.941 rows=901719 loops=1)
-> Hash (cost=8933.90..8933.90 rows=4261 width=8) (actual
time=0.604..0.604 rows=1 loops=1)
-> Bitmap Heap Scan on zip
(cost=32.91..8933.90rows=4261 width=8) (actual time=
0.588..0.591 rows=1 loops=1)
Recheck Cond: ((zip_cd)::text = '66105'::text)
-> Bitmap Index Scan on zip_zip_cd_key (cost=
0.00..32.91 rows=4261 width=0) (actual time=0.559..0.559 rows=1 loops=1)
Index Cond: ((zip_cd)::text = '66105'::text)

It is the join that is killing the query but I am at a loss of the best
approach to fix it. I have some work arounds in mind by flattening out the
tables but I would rather not have to do that.

Thanks!
Aaron

--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2007-04-25 22:23:22 Re: Query Join Performance
Previous Message Richard Albright 2007-04-25 19:36:06 Re: sql wrapped plpgsql set returning function