From: | Paul Ramsey <pramsey(at)refractions(dot)net> |
---|---|
To: | "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com> |
Cc: | "Pg Performance list" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Proximity query with GIST and row estimation |
Date: | 2007-02-14 03:01:34 |
Message-ID: | 0BD9E842-2DA7-4F90-A732-AFD60C1F50C7@refractions.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
You'll find that PostGIS does a pretty good job of selectivity
estimation.
P
On 13-Feb-07, at 9:09 AM, Guillaume Smet wrote:
> Hi all,
>
> Following the work on Mark Stosberg on this list (thanks Mark!), I
> optimized our slow proximity queries by using cube, earthdistance
> (shipped with contrib) and a gist index. The result is globally very
> interesting apart for a specific query and we'd like to be able to fix
> it too to be more consistent (it's currently faster with a basic
> distance calculation based on acos, cos and so on but it's slow
> anyway).
>
> The problem is that we have sometimes very few places near a given
> location (small city) and sometimes a lot of them (in Paris, Bruxelles
> and so on - it's the case we have here). The gist index I created
> doesn't estimate the number of rows in the area very well.
>
> Table: lieu (100k rows) with wgslat and wgslon as numeric
> Table: lieugelieu (200k rows, 1k with codegelieu = 'PKG')
> Index: "idx_lieu_earth" gist (ll_to_earth(wgslat::double precision,
> wgslon::double precision))
>
> The simplified query is:
> SELECT DISTINCT l.numlieu, l.nomlieu, ROUND
> (earth_distance(ll_to_earth(48.85957600, 2.34860800),
> ll_to_earth(l.wgslat, l.wgslon))) as dist
> FROM lieu l, lieugelieu lgl
> WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600,
> 2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu =
> l.numlieu ORDER BY dist ASC LIMIT 2;
> It's used to find the nearest car parks from a given location.
>
> The plan is attached plan_earthdistance_nestedloop.txt. It uses a
> nested loop because the row estimate is pretty bad: (cost=0.00..3.38
> rows=106 width=0) (actual time=30.229..30.229 rows=5864 loops=1).
>
> If I disable the nested loop, the plan is different and faster (see
> plan_earthdistance_hash.txt attached).
>
> Is there any way to improve this estimation? I tried to set the
> statistics of wgslat and wgslon higher but it doesn't change anything
> (I don't know if the operator is designed to use the statistics).
>
> Any other idea to optimize this query is very welcome too.
>
> --
> Guillaume
> <plan_earthdistance_nestedloop.txt>
> <plan_earthdistance_hash.txt>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2007-02-14 05:01:27 | Re: [PERFORM] Direct I/O issues |
Previous Message | Heikki Linnakangas | 2007-02-13 22:52:54 | Re: quad or dual core Intel CPUs |