From: | Tino Wildenhain <tino(at)wildenhain(dot)de> |
---|---|
To: | "Uwe C(dot) Schroeder" <uwe(at)oss4u(dot)com> |
Cc: | Michael Fuhr <mike(at)fuhr(dot)org>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: How many views... |
Date: | 2004-11-29 08:44:37 |
Message-ID: | 1101717877.8902.62.camel@sabrina.peacock.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Am Sonntag, den 28.11.2004, 23:06 -0800 schrieb Uwe C. Schroeder:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Sunday 28 November 2004 10:43 pm, Michael Fuhr wrote:
> > On Sun, Nov 28, 2004 at 06:41:50PM -0800, Uwe C. Schroeder wrote:
> > > I need to search a lot of locations based on distance (simple zipcode
> > > match based on longitude and latitude). However I need to calculate the
> > > distance between each of the nodes, so if you are in xxx I need to get
> > > the distance to all others in the database. I'm currently doing this with
> > > a stored procedure that gets the originating zipcode and a maximum
> > > distance in miles which then selects all other nodes within that search
> > > radius. This is pretty unhandy, but it works.
> >
> > What's unhandy about this approach? I've written stored procedures
> > that do exactly what you're talking about; they work fine and are
> > easy to use.
> >
> > Are you using a bounding box to limit the number of nodes that
> > you need to check? For example, if the originating zipcode is
> > at 40.0N 90.0W and you want to find all other zipcodes within
> > 50 miles, then you'd only need to check the distance to those
> > zipcodes with a latitude between about 39.27N - 40.73N and a
> > longitude between about 89.05W and 90.95W. No zipcode outside
> > that box could possibly be within 50 miles of the origin, so
> > there's no need to calculate and check the distances to them.
> > If you have indexes on latitude and longitude then the search
> > should be fast.
>
> The "unhandy" part is maybe a personal perception. I like stored procs, but in
> this case the calculation is repeated over and over again (ok, it would be
> the same with views). Basically the part I don't like is that the proc
> currently calculates way more values than needed. Because something like
> .... where sqrt(pow((lat1 - lat2),2) + pow((long1 - long2),2)) >= 50
Did you actually try to use PGs geometric datatypes here? My guess would
be you can optimize indices that way - or at least use maybe overlapping
grid where gridsize is 2*max(distance you ask for):
+-----Grid Cell ----------+
| |
| |
|+--max bb---+ |
|| | |
|| | |
|| *<max>| |
|| | |
|+-----------+ |
| |
+-------------------------+
| 2*max |
where max is max distance. You could build
a functional index based on the grid cells
(maybe give ID to any of them) and therefore
quickly filter out a whole lot of points which
will never be reached by the distance.
Btw. If you dont use the geometric types,
you can speed up if you skip the sqrt.
if sqrt(x) >= y then x >= y²
where 50²=2500 can be calculated
beforehand and saves a lot of calculating
later. (sqrt is expensive!)
> certainly calculates the distance of all the records and then compares the
> result to the 50 mile radius. I'd rather have something that excludes most of
> the records that aren't in question anyways. How do you come to the lat/long
> values for the max difference? Is there a general formula for that? This
> looks like I could omit records too far away from the calculation in the
> first place. I know - maybe I should dig for those old geometry books that
> are somewhere in a box, but if you have the base for that handy I'd
> appreciate if you tell me (I hated math all my life ;-) )
^ ^ ^ ^ ^ ^
why so? Math is just cool :-)
Regards
Tino
From | Date | Subject | |
---|---|---|---|
Next Message | Chris Green | 2004-11-29 08:50:51 | Re: Adding Reply-To: <listname> to Lists configuration ... |
Previous Message | Holger Klawitter | 2004-11-29 08:15:24 | Re: How many views... |