From: | Alban Hertroys <haramrae(at)gmail(dot)com> |
---|---|
To: | alexandros_e <alexandros(dot)ef(at)gmail(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Postgresql GROUP BY "SIMILAR" but not equal values |
Date: | 2014-02-06 15:41:31 |
Message-ID: | CAF-3MvN9WLFhZA-XDdZNxTKCuJREYLLe1uMQ+6keEdL9Z5gMjw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 6 February 2014 16:18, alexandros_e <alexandros(dot)ef(at)gmail(dot)com> wrote:
> Let's say I have this table foo
>
> ID|G1|T1|
> 1|2|ABC|
> 1|2|ABCD|
> 1|2|DEF|
> 1|2|DEFG|
>
> SELECT * FROM foo
> GROUP BY ID,G1,T1
> Is there a way in SQL or PostgreSQL in general to group by values than are
> not exactly the same but are quite similar (like 'ABC' and 'ABCD') based on
> some distance function (levenshtein for example) if the distance is within
> some threshold (i.e., 1)
Perhaps there is: You can calculate the levenshtein distance between
those values using a self-join and then GROUP BY the result of that
expression and limit the results with HAVING.
For example:
SELECT foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1)
FROM foo foo1
INNER JOIN foo foo2 ON (foo2.ID = foo1.ID AND foo2.G1 = foo1.G1)
GROUP BY foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1)
HAVING levenshtein(foo1.T1, foo2.T1) > 1
Is that what you're looking for?
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.
From | Date | Subject | |
---|---|---|---|
Next Message | Sergey Konoplev | 2014-02-06 16:11:19 | Re: Postgresql GROUP BY "SIMILAR" but not equal values |
Previous Message | Adrian Klaver | 2014-02-06 15:36:46 | Re: Help with connection issue - started today |