From: | David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Merge rows based on Levenshtein distance |
Date: | 2014-12-02 00:49:41 |
Message-ID: | 1417481381890-5828847.post@n5.nabble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
mongoose wrote
> I am new to PostgreSQL and I have the following table:
>
> Name, City
> "Alex", "Washington"
> "Aleex1", "Washington"
> "Bob", "NYC"
> "Booob", "NYC"
>
> I want to "merge" similar rows based on levenshtein distance between names
> so that I have the following table:
>
> id, Name, City
> 1,"Alex", "Washington"
> 1,"Aleex1", "Washington"
> 2,"Bob", "NYC"
> 2,"Booob", "NYC"
>
> How could I do that on PostgreSQL? Is there an SQL command for this?
> Thnsls
So you have a table of N names and you want to evaluate (N-1)^2 pairs and
then use the output of the levenshtein calculation to group them together.
SELECT
l_names.name_value,
r_names.name_value, leven[...](l_names.name_value, r_names.name_value) AS
pair_group
FROM table_of_names AS l_names
CROSS JOIN table_of_names AS r_names
WHERE l_names.name_value <> r_names.name_value
;
Feel free to add "group by city" or "WHERE substring(l_names.name_value, 0,
1) = substring(r_names.name_value, 0, 1)" since it seems you need more than
just a name-distance to generate the desired groups. You'd likely want to
add the same "substring" call to the SELECT-list and "GROUP BY" clauses...
David J.
--
View this message in context: http://postgresql.nabble.com/Merge-rows-based-on-Levenshtein-distance-tp5828841p5828847.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
From | Date | Subject | |
---|---|---|---|
Next Message | Bryn Jeffries | 2014-12-02 00:50:58 | Re: Irreversible SET ROLE |
Previous Message | Tom Lane | 2014-12-02 00:39:33 | Re: Irreversible SET ROLE |