Re: Postgresql GROUP BY "SIMILAR" but not equal values

From: alexandros_e <alexandros(dot)ef(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgresql GROUP BY "SIMILAR" but not equal values
Date: 2014-02-06 16:27:04
Message-ID: 1391704024728-5790876.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Conceptually, Tom (as always) is right. But Alban's hack help.

DROP TABLE foo;
CREATE TABLE IF NOT EXISTS foo(ID INTEGER,G1 INTEGER, T1 TEXT, ID2 SERIAL
PRIMARY KEY);

INSERT INTO foo(ID,G1,T1) VALUES(1,2,'ABC');
INSERT INTO foo(ID,G1,T1) VALUES(1,2,'ABCD');
INSERT INTO foo(ID,G1,T1) VALUES(1,2,'ABDC');
INSERT INTO foo(ID,G1,T1) VALUES(1,2,'DEF');
INSERT INTO foo(ID,G1,T1) VALUES(1,2,'DEFH');

/* A little editing to remove duplicates a to b and b to a */
SELECT foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1),foo2.T1
FROM foo foo1
INNER JOIN foo foo2 ON (foo2.ID = foo1.ID AND foo2.G1 = foo1.G1)
WHERE foo1.ID2<foo2.ID2
GROUP BY foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1),foo2.T1
HAVING levenshtein(foo1.T1, foo2.T1) <2;

RETURNS

ID|G1|foo1.T1|foo2.T1
1;2;"ABC";1;"ABCD"
1;2;"ABC";1;"ABDC"
1;2;"ABCD";2;"ABDC"
1;2;"DEF";1;"DEFH"

Then it requires a second grouping but as Tom suggested it would be hard to
somehow group all similar cases together because then it becomes a
clustering problem. With a second grouping we will have 3 records instead of
4, so it is better than the initial case by 25%.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Postgresql-GROUP-BY-SIMILAR-but-not-equal-values-tp5790860p5790876.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Sergey Konoplev 2014-02-06 16:39:16 Re: Ordering Results by a Supplied Order
Previous Message Bret Stern 2014-02-06 16:25:04 Re: Help with connection issue - started today