From: | Andy Colson <andy(at)squeakycode(dot)net> |
---|---|
To: | Rémi Cura <remi(dot)cura(at)gmail(dot)com> |
Cc: | PostgreSQL General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: efficient way to do "fuzzy" join |
Date: | 2014-04-11 16:10:52 |
Message-ID: | 5348140C.4030409@squeakycode.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> 2014-04-11 17:09 GMT+02:00 Andy Colson <andy(at)squeakycode(dot)net
> <mailto:andy(at)squeakycode(dot)net>>:
>
> On 4/11/2014 7:50 AM, Rémi Cura wrote:
>
> Hey dear List,
>
> I'm looking for some advice about the best way to perform a "fuzzy"
> join, that is joining two table based on approximate matching.
>
> It is about temporal matching
> given a table A with rows containing data and a control_time (for
> instance 1 ; 5; 6; .. sec, not necessarly rounded of evenly-spaced)
>
> given another table B with lines on no precise timing (eg
> control_time =
> 2.3 ; 5.8 ; 6.2 for example)
>
> How to join every row of B to A based on
> min(@(A.control_time-B.__control_time))
> (that is, for every row of B, get the row of A that is temporaly the
> closest),
> in an efficient way?
> (to be explicit, 2.3 would match to 1, 5.8 to 6, 6.2 to 6)
>
> Optionnaly, how to get interpolation efficiently (meaning one
> has to get
> the previous time and next time for 1 st order interpolation, 2
> before
> and 2 after for 2nd order interpolation, and so on)?
> (to be explicit 5.8 would match to 5 and 6, the weight being 0.2
> and 0.8
> respectively)
>
>
> Currently my data is spatial so I use Postgis function to
> interpolate a
> point on a line, but is is far from efficient or general, and I
> don't
> have control on interpolation (only the spatial values are
> interpolated).
>
>
> Cheers,
> Rémi-C
>
>
>
> Have you seen the range type?
>
> http://www.postgresql.org/__docs/9.3/static/rangetypes.__html
> <http://www.postgresql.org/docs/9.3/static/rangetypes.html>
>
> Not fuzzy, but is indexable.
>
> -Andy
>
>
On 4/11/2014 10:57 AM, Rémi Cura wrote:> Hey,
> thanks for your answer.
>
> I think you are right, range type with index could at least provide a
> fast matching,
> thus avoiding the numrow(A) * numrow(B) complexity .
>
> Though I don't see how to use it to interpolate for more than 1st order.
>
> Cheers,
> Rémi-C
>
>
Hum.. Would you like to set an upper bound on the number of seconds the
join would match? Maybe range types could give you an indexed upper
bound ("match within +/- 2 seconds only"), then use another match to
find the actual min. (I do something like this in PostGis, use bounding
box to do quick index lookup, then st_distance to find the nearest)
I can see two row's in A matching the same row in B. Would that be ok?
TableA
------
1
5
6
TableB
------
0.9
1.1
6.6
7.7
How should those two tables join?
-Andy
From | Date | Subject | |
---|---|---|---|
Next Message | Gaurav Jindal | 2014-04-11 16:18:37 | Need query |
Previous Message | Jack.O'Sullivan | 2014-04-11 15:59:35 | Re: CLOB & BLOB limitations in PostgreSQL |