Re: query speed question

From: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To: Christopher Condit <condit(at)sdsc(dot)edu>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: query speed question
Date: 2009-09-10 17:40:06
Message-ID: 474FD66B-DB38-4DC3-BC25-7BE6EFF2459C@solfertje.student.utwente.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10 Sep 2009, at 19:09, Christopher Condit wrote:

> Here are both queries (with explain analyze):
>
> EXPLAIN ANALYZE SELECT B.* FROM B,
> (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0)
> AS foo WHERE getwoalatitude(B.latitude::numeric) = foo.lat AND
> getwoalongitude(B.longitude::numeric) = foo.lon
>
> after vacuum analyze:
> "Merge Join (cost=17873237.91..26594735.94 rows=1984509 width=420)
> (actual time=752983.201..941125.197 rows=226941 loops=1)"
> " Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric)))
> AND (a.lon = (getwoalongitude((b.longitude)::numeric))))"
> " -> Sort (cost=2019.51..2037.61 rows=7239 width=16) (actual
> time=30.704..32.171 rows=2111 loops=1)"
> " Sort Key: a.lat, a.lon"
> " Sort Method: quicksort Memory: 212kB"
> " -> Index Scan using depthidx on a (cost=0.00..1555.43
> rows=7239 width=16) (actual time=0.533..24.631 rows=2111 loops=1)"
> " Index Cond: (depth = 0)"
> " Filter: ((value > 0::numeric) AND (value <
> 2::numeric))"

Here you can see that the estimated row counts are way off compared to
the actual number of rows. That means the statistics for that table
aren't reliable; either you need to ANALYZE them or you need to
increase the statistics size of the columns involved.

This part of the query is the quicker part, but it may be possible
that the planner wouldn't have picked a merge-join with the remainder
if the estimates were closer to reality.

You can paste those queries in http://explain-analyze.info/, although
you'll need to strip the quotes from the plan (not sure where those
get added, psql doesn't add them).

> " -> Materialize (cost=17871190.21..18083338.96 rows=16971900
> width=420) (actual time=751324.751..919278.574 rows=16963350 loops=1)"
> " -> Sort (cost=17871190.21..17913619.96 rows=16971900
> width=420) (actual time=751324.744..820522.604 rows=16963350 loops=1)"
> " Sort Key: (getwoalatitude((b.latitude)::numeric)),
> (getwoalongitude((b.longitude)::numeric))"
> " Sort Method: external merge Disk: 4599344kB"
> " -> Seq Scan on b (cost=0.00..750696.00
> rows=16971900 width=420) (actual time=1.781..229158.949
> rows=16971901 loops=1)"
> "Total runtime: 942295.914 ms"

Here's where most time is spent. Firstly, an external merge to disk is
going to be slow. I think the setting controlling the amount of memory
available for sorting is work_mem, you can try increasing that in the
client (set work_mem TO <some value>) before executing your query and
see what value is sufficient.

The other issue here is the sequential scan on table b. I think that's
caused by the planner needing to merge to disk though.

> EXPLAIN ANALYZE SELECT b.* FROM b JOIN a ON (getwoalatitude
> (b.latitude::numeric) = a.lat AND getwoalongitude
> (b.longitude::numeric) = a.lon) WHERE a.value > 0 AND a.value < 2
> AND a.depth = 0
>
> "Merge Join (cost=17873237.91..26594735.94 rows=1984509 width=420)
> (actual time=755255.801..906407.961 rows=226941 loops=1)"
> " Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric)))
> AND (a.lon = (getwoalongitude((b.longitude)::numeric))))"
> " -> Sort (cost=2019.51..2037.61 rows=7239 width=16) (actual
> time=27.658..29.108 rows=2111 loops=1)"
> " Sort Key: a.lat, a.lon"
> " Sort Method: quicksort Memory: 212kB"
> " -> Index Scan using depthidx on a (cost=0.00..1555.43
> rows=7239 width=16) (actual time=0.467..21.646 rows=2111 loops=1)"
> " Index Cond: (depth = 0)"
> " Filter: ((value > 0::numeric) AND (value <
> 2::numeric))"
> " -> Materialize (cost=17871190.21..18083338.96 rows=16971900
> width=420) (actual time=753605.414..884549.890 rows=16963350 loops=1)"
> " -> Sort (cost=17871190.21..17913619.96 rows=16971900
> width=420) (actual time=753605.407..822844.299 rows=16963350 loops=1)"
> " Sort Key: (getwoalatitude((b.latitude)::numeric)),
> (getwoalongitude((b.longitude)::numeric))"
> " Sort Method: external merge Disk: 4599344kB"
> " -> Seq Scan on b (cost=0.00..750696.00
> rows=16971900 width=420) (actual time=0.095..229888.646
> rows=16971901 loops=1)"
> "Total runtime: 911284.022 ms"

This plan is actually very similar to the original query's plan, it
has the same problems.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4aa939f812071577543529!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Peter Headland 2009-09-10 17:52:18 Re: COPY command character set
Previous Message Tom Lane 2009-09-10 17:38:22 Re: COPY command character set