Quick Links

Re: Merge rows based on Levenshtein distance

From:	David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Merge rows based on Levenshtein distance
Date:	2014-12-03 18:44:43
Message-ID:	CAKFQuwYR0bAcgaMc-UthUSQ9UTX_TH43Dsq-xz1ZGHhhruEPTA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Wed, Dec 3, 2014 at 9:14 AM, pinker [via PostgreSQL] <
ml-node+s1045698n5829111h65(at)n5(dot)nabble(dot)com> wrote:

> There is nice extension in postgres: fuzzystrmatch
> <http://www.postgresql.org/docs/9.3/static/fuzzystrmatch.html> I have
> used to calculate the distance. From documetation:
>
> SELECT levenshtein_less_equal('extensive', 'exhaustive',2);
>
> You can use it then with your group by query.
>
>
Something like this - replace the substring(...) comparison with
legenshtein_less_equal(...) or whatever comparison you find applicable.

In the case below new groups are started whenever the first letter of the
value changes.

The first group would be NULL so I add a COALESCE() call to make it 0 -
subsequent groups start with 1 and increment properly.

WITH src (val) AS (
VALUES ('A1'::varchar),('A2'),('B1'),('B2'),('B3'),('C1'),('D1')
)
, grp AS (
SELECT val
, CASE WHEN
substring(val,1,1) <> substring(lag(val) OVER (ORDER BY
val),1,1)
THEN 1
ELSE NULL
END AS changed
, ROW_NUMBER() OVER (ORDER BY val) AS val_idx
FROM src
)
SELECT val, COALESCE(sum(changed) OVER (ORDER BY val_idx), 0) AS group_id
FROM grp
;

David J.

--
View this message in context: http://postgresql.nabble.com/Merge-rows-based-on-Levenshtein-distance-tp5828841p5829143.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

In response to

Re: Merge rows based on Levenshtein distance at 2014-12-03 16:14:02 from pinker

Responses

Re: Merge rows based on Levenshtein distance at 2014-12-03 18:55:49 from mongoose

Browse pgsql-general by date

	From	Date	Subject
Next Message	Michael Nolan	2014-12-03 18:51:46	Re: Merge rows based on Levenshtein distance
Previous Message	David G Johnston	2014-12-03 18:12:58	Re: Merge rows based on Levenshtein distance