From: | Andy Colson <andy(at)squeakycode(dot)net> |
---|---|
To: | Rémi Cura <remi(dot)cura(at)gmail(dot)com>, PostgreSQL General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: efficient way to do "fuzzy" join |
Date: | 2014-04-11 17:16:12 |
Message-ID: | 5348235C.2060306@squeakycode.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 4/11/2014 7:50 AM, Rémi Cura wrote:
> Hey dear List,
>
> I'm looking for some advice about the best way to perform a "fuzzy"
> join, that is joining two table based on approximate matching.
>
> It is about temporal matching
> given a table A with rows containing data and a control_time (for
> instance 1 ; 5; 6; .. sec, not necessarly rounded of evenly-spaced)
>
> given another table B with lines on no precise timing (eg control_time =
> 2.3 ; 5.8 ; 6.2 for example)
>
> How to join every row of B to A based on
> min(@(A.control_time-B.control_time))
> (that is, for every row of B, get the row of A that is temporaly the
> closest),
> in an efficient way?
> (to be explicit, 2.3 would match to 1, 5.8 to 6, 6.2 to 6)
>
> Optionnaly, how to get interpolation efficiently (meaning one has to get
> the previous time and next time for 1 st order interpolation, 2 before
> and 2 after for 2nd order interpolation, and so on)?
> (to be explicit 5.8 would match to 5 and 6, the weight being 0.2 and 0.8
> respectively)
>
>
> Currently my data is spatial so I use Postgis function to interpolate a
> point on a line, but is is far from efficient or general, and I don't
> have control on interpolation (only the spatial values are interpolated).
>
>
> Cheers,
> Rémi-C
Ok, here is a just sql way. No ranges. No idea if its right. A first
pass, so to speak.
create table a(t float, data text);
create table b(t float, data text);
insert into a values (1), (5), (6);
insert into b values (2.3), (5.8), (6.2);
select a.t, b.t
from (
select t, least( least(t, mint), least(t, maxt)) as t2 from (
select t,
(select t from a where a.t >= b.t order by a.t limit 1) as mint,
(select t from a where a.t < b.t order by a.t desc limit 1) as maxt
from b
) as tmp
) as tmp2
inner join a on (tmp2.t2 = a.t)
inner join b on (tmp2.t = b.t)
The middle part is the magic:
select t,
(select t from a where a.t >= b.t order by a.t limit 1) as mint,
(select t from a where a.t < b.t order by a.t desc limit 1) as maxt
from b
The rest is just to make it usable. If t is indexed, it'll probably be
fast too.
-Andy
From | Date | Subject | |
---|---|---|---|
Next Message | David G Johnston | 2014-04-11 17:39:10 | Re: Need query |
Previous Message | sparikh | 2014-04-11 16:48:31 | Trouble installing Slony 2.0 |