From: | Sergey Konoplev <gray(dot)ru(at)gmail(dot)com> |
---|---|
To: | Alban Hertroys <haramrae(at)gmail(dot)com> |
Cc: | alexandros_e <alexandros(dot)ef(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Postgresql GROUP BY "SIMILAR" but not equal values |
Date: | 2014-02-06 16:11:19 |
Message-ID: | CAL_0b1sWa7-kbDgg5-GB5PAZZjJEtEhjxMdd_7mmmAaEozsMyg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, Feb 6, 2014 at 7:41 AM, Alban Hertroys <haramrae(at)gmail(dot)com> wrote:
> On 6 February 2014 16:18, alexandros_e <alexandros(dot)ef(at)gmail(dot)com> wrote:
>> Is there a way in SQL or PostgreSQL in general to group by values than are
>> not exactly the same but are quite similar (like 'ABC' and 'ABCD') based on
>> some distance function (levenshtein for example) if the distance is within
>> some threshold (i.e., 1)
>
> Perhaps there is: You can calculate the levenshtein distance between
> those values using a self-join and then GROUP BY the result of that
> expression and limit the results with HAVING.
>
> For example:
> SELECT foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1)
> FROM foo foo1
> INNER JOIN foo foo2 ON (foo2.ID = foo1.ID AND foo2.G1 = foo1.G1)
> GROUP BY foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1)
> HAVING levenshtein(foo1.T1, foo2.T1) > 1
From my understanding of the question, probably, adding another
levenshtein distance to some base value will make more sense:
GROUP BY levenshtein(foo1.T1, foo2.T1), levenshtein(foo1.T1, 'A')
Though, it looks like a clusterization task for me, and therefore I
would recommend OP to look at the PL/R
http://www.joeconway.com/plr/doc/index.html.
--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA
http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray(dot)ru(at)gmail(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Gauthier, Dave | 2014-02-06 16:14:56 | Re: Postgresql GROUP BY "SIMILAR" but not equal values |
Previous Message | Alban Hertroys | 2014-02-06 15:41:31 | Re: Postgresql GROUP BY "SIMILAR" but not equal values |