| From: | Paul Ramsey <pramsey(at)refractions(dot)net> |
|---|---|
| To: | "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com> |
| Cc: | "Pg Performance list" <pgsql-performance(at)postgresql(dot)org> |
| Subject: | Re: Proximity query with GIST and row estimation |
| Date: | 2007-02-14 03:01:34 |
| Message-ID: | 0BD9E842-2DA7-4F90-A732-AFD60C1F50C7@refractions.net |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-performance |
You'll find that PostGIS does a pretty good job of selectivity
estimation.
P
On 13-Feb-07, at 9:09 AM, Guillaume Smet wrote:
> Hi all,
>
> Following the work on Mark Stosberg on this list (thanks Mark!), I
> optimized our slow proximity queries by using cube, earthdistance
> (shipped with contrib) and a gist index. The result is globally very
> interesting apart for a specific query and we'd like to be able to fix
> it too to be more consistent (it's currently faster with a basic
> distance calculation based on acos, cos and so on but it's slow
> anyway).
>
> The problem is that we have sometimes very few places near a given
> location (small city) and sometimes a lot of them (in Paris, Bruxelles
> and so on - it's the case we have here). The gist index I created
> doesn't estimate the number of rows in the area very well.
>
> Table: lieu (100k rows) with wgslat and wgslon as numeric
> Table: lieugelieu (200k rows, 1k with codegelieu = 'PKG')
> Index: "idx_lieu_earth" gist (ll_to_earth(wgslat::double precision,
> wgslon::double precision))
>
> The simplified query is:
> SELECT DISTINCT l.numlieu, l.nomlieu, ROUND
> (earth_distance(ll_to_earth(48.85957600, 2.34860800),
> ll_to_earth(l.wgslat, l.wgslon))) as dist
> FROM lieu l, lieugelieu lgl
> WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600,
> 2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu =
> l.numlieu ORDER BY dist ASC LIMIT 2;
> It's used to find the nearest car parks from a given location.
>
> The plan is attached plan_earthdistance_nestedloop.txt. It uses a
> nested loop because the row estimate is pretty bad: (cost=0.00..3.38
> rows=106 width=0) (actual time=30.229..30.229 rows=5864 loops=1).
>
> If I disable the nested loop, the plan is different and faster (see
> plan_earthdistance_hash.txt attached).
>
> Is there any way to improve this estimation? I tried to set the
> statistics of wgslat and wgslon higher but it doesn't change anything
> (I don't know if the operator is designed to use the statistics).
>
> Any other idea to optimize this query is very welcome too.
>
> --
> Guillaume
> <plan_earthdistance_nestedloop.txt>
> <plan_earthdistance_hash.txt>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 2007-02-14 05:01:27 | Re: [PERFORM] Direct I/O issues |
| Previous Message | Heikki Linnakangas | 2007-02-13 22:52:54 | Re: quad or dual core Intel CPUs |