Re: Postgresql GROUP BY "SIMILAR" but not equal values

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: alexandros_e <alexandros(dot)ef(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgresql GROUP BY "SIMILAR" but not equal values
Date: 2014-02-06 15:41:31
Message-ID: CAF-3MvN9WLFhZA-XDdZNxTKCuJREYLLe1uMQ+6keEdL9Z5gMjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 6 February 2014 16:18, alexandros_e <alexandros(dot)ef(at)gmail(dot)com> wrote:
> Let's say I have this table foo
>
> ID|G1|T1|
> 1|2|ABC|
> 1|2|ABCD|
> 1|2|DEF|
> 1|2|DEFG|
>
> SELECT * FROM foo
> GROUP BY ID,G1,T1

> Is there a way in SQL or PostgreSQL in general to group by values than are
> not exactly the same but are quite similar (like 'ABC' and 'ABCD') based on
> some distance function (levenshtein for example) if the distance is within
> some threshold (i.e., 1)

Perhaps there is: You can calculate the levenshtein distance between
those values using a self-join and then GROUP BY the result of that
expression and limit the results with HAVING.

For example:
SELECT foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1)
FROM foo foo1
INNER JOIN foo foo2 ON (foo2.ID = foo1.ID AND foo2.G1 = foo1.G1)
GROUP BY foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1)
HAVING levenshtein(foo1.T1, foo2.T1) > 1

Is that what you're looking for?

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sergey Konoplev 2014-02-06 16:11:19 Re: Postgresql GROUP BY "SIMILAR" but not equal values
Previous Message Adrian Klaver 2014-02-06 15:36:46 Re: Help with connection issue - started today