Re: query speed question

From: Christopher Condit <condit(at)sdsc(dot)edu>
To: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: query speed question
Date: 2009-09-10 17:09:47
Message-ID: BAFDED91EF7D5A43A30F897D6C94B1284D283B49D0@MBX.ad.sdsc.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Alban-

> > I have two tables that are georeferenced (although in this case I'm
> > not using PostGIS) that I need to join.
> > A ( lat | lon | depth | value)
> > |A| = 1,100,000
> >
> > B ( lat | lon | attributes)
> > |B| = 14,000,000
> >
> > A is a special case because the lat / lon values are all at half
> > degree intervals (ie 33.5, -12.5). The lat/lons in B are arbitrary.
> > I've written a function in B called getSpecialLat(latitude) and
> > getSpecialLon(longitude) to calculate the correct A latitude and
> > built an index on both functions.
>
> My guess is that those functions round lat and lon values to their
> nearest half-degree interval counterpart as in table A?
> I assume you marked that function immutable?
> Is the return type indeed a numeric, as there are some explicit casts
> in the query plan?

Yes - the function is immutable. The return type is numeric, but I fixed the casting on input problem...

> > Here's the query that I'm trying, but it's rather slow:
> > SELECT B.* FROM B,
> > (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0)
> > AS foo
> > WHERE getSpecialLat(B.lat) = foo.lat AND getSpecialLon(B.lon) =
> > foo.lon
> >
> > "Nested Loop (cost=3569.88..32055.02 rows=1414 width=422)"
> > " -> Index Scan using A_valueidx on A (cost=0.00..555.26 rows=6
> > width=16)"
> > " Index Cond: ((value > 0) AND (value < 2))"
> > " Filter: (depth = 0)"
> > " -> Bitmap Heap Scan on B (cost=3569.88..5029.48 rows=424
> > width=422)"
> > " Recheck Cond: ((getSpecialLon((B.lon)::numeric) = A.lon)
> > AND (getSpecialLat((B.lat)::numeric) = A.lat))"
> > " -> BitmapAnd (cost=3569.88..3569.88 rows=424 width=0)"
> > " -> Bitmap Index Scan on Blonidx (cost=0.00..1760.38
> > rows=84859 width=0)"
> > " Index Cond: (getSpecialLon((B.lon)::numeric) =
> > A.lon)"
> > " -> Bitmap Index Scan on Blatidx (cost=0.00..1766.81
> > rows=84859 width=0)"
> > " Index Cond: (getSpeicalLat
> > ((B.latitude)::numeric) = A.lat)"
> >
> > Am I missing something in terms of speeding up this query?
>
>
> Hard to tell without knowing where most time gets spent. An EXPLAIN
> ANALYSE would tell.

Here are both queries (with explain analyze):

EXPLAIN ANALYZE SELECT B.* FROM B,
(SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0) AS foo WHERE getwoalatitude(B.latitude::numeric) = foo.lat AND getwoalongitude(B.longitude::numeric) = foo.lon

after vacuum analyze:
"Merge Join (cost=17873237.91..26594735.94 rows=1984509 width=420) (actual time=752983.201..941125.197 rows=226941 loops=1)"
" Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric))) AND (a.lon = (getwoalongitude((b.longitude)::numeric))))"
" -> Sort (cost=2019.51..2037.61 rows=7239 width=16) (actual time=30.704..32.171 rows=2111 loops=1)"
" Sort Key: a.lat, a.lon"
" Sort Method: quicksort Memory: 212kB"
" -> Index Scan using depthidx on a (cost=0.00..1555.43 rows=7239 width=16) (actual time=0.533..24.631 rows=2111 loops=1)"
" Index Cond: (depth = 0)"
" Filter: ((value > 0::numeric) AND (value < 2::numeric))"
" -> Materialize (cost=17871190.21..18083338.96 rows=16971900 width=420) (actual time=751324.751..919278.574 rows=16963350 loops=1)"
" -> Sort (cost=17871190.21..17913619.96 rows=16971900 width=420) (actual time=751324.744..820522.604 rows=16963350 loops=1)"
" Sort Key: (getwoalatitude((b.latitude)::numeric)), (getwoalongitude((b.longitude)::numeric))"
" Sort Method: external merge Disk: 4599344kB"
" -> Seq Scan on b (cost=0.00..750696.00 rows=16971900 width=420) (actual time=1.781..229158.949 rows=16971901 loops=1)"
"Total runtime: 942295.914 ms"

EXPLAIN ANALYZE SELECT b.* FROM b JOIN a ON (getwoalatitude(b.latitude::numeric) = a.lat AND getwoalongitude(b.longitude::numeric) = a.lon) WHERE a.value > 0 AND a.value < 2 AND a.depth = 0

"Merge Join (cost=17873237.91..26594735.94 rows=1984509 width=420) (actual time=755255.801..906407.961 rows=226941 loops=1)"
" Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric))) AND (a.lon = (getwoalongitude((b.longitude)::numeric))))"
" -> Sort (cost=2019.51..2037.61 rows=7239 width=16) (actual time=27.658..29.108 rows=2111 loops=1)"
" Sort Key: a.lat, a.lon"
" Sort Method: quicksort Memory: 212kB"
" -> Index Scan using depthidx on a (cost=0.00..1555.43 rows=7239 width=16) (actual time=0.467..21.646 rows=2111 loops=1)"
" Index Cond: (depth = 0)"
" Filter: ((value > 0::numeric) AND (value < 2::numeric))"
" -> Materialize (cost=17871190.21..18083338.96 rows=16971900 width=420) (actual time=753605.414..884549.890 rows=16963350 loops=1)"
" -> Sort (cost=17871190.21..17913619.96 rows=16971900 width=420) (actual time=753605.407..822844.299 rows=16963350 loops=1)"
" Sort Key: (getwoalatitude((b.latitude)::numeric)), (getwoalongitude((b.longitude)::numeric))"
" Sort Method: external merge Disk: 4599344kB"
" -> Seq Scan on b (cost=0.00..750696.00 rows=16971900 width=420) (actual time=0.095..229888.646 rows=16971901 loops=1)"
"Total runtime: 911284.022 ms"

Any thoughts?

Thanks,
-Chris

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Andreen 2009-09-10 17:14:24 Re: array datatype supported by Perl DBI with Postgres DBD ?
Previous Message Tom Lane 2009-09-10 16:54:56 Re: WAL archiving file name collision