Re: Postgresql GROUP BY "SIMILAR" but not equal values

From: "Gauthier, Dave" <dave(dot)gauthier(at)intel(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, alexandros_e <alexandros(dot)ef(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgresql GROUP BY "SIMILAR" but not equal values
Date: 2014-02-06 16:14:56
Message-ID: 0AD01C53605506449BA127FB8B99E5E179DB8055@FMSMSX114.amr.corp.intel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

What about a regexp match ?

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
Sent: Thursday, February 06, 2014 10:32 AM
To: alexandros_e
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] Postgresql GROUP BY "SIMILAR" but not equal values

alexandros_e <alexandros(dot)ef(at)gmail(dot)com> writes:
> Is there a way in SQL or PostgreSQL in general to group by values than
> are not exactly the same but are quite similar (like 'ABC' and 'ABCD')
> based on some distance function (levenshtein for example) if the
> distance is within some threshold (i.e., 1)

Well, you can GROUP BY the result of a function.

You are going to have to think harder than the above in any case.
For example, it's not hard to imagine a "similarity" operator that says that A is similar to B, and B is similar to C, but if you ask it to compare A to C it says they're not similar (enough). Now what?
Are A,B,C all part of the same group? If you take the transitive closure of such an operator you probably end up with everything in one group; but if you don't, it's hard to see a principled result at all.

If you can cast your problem as transformation of the values into some canonical or representative form, then you can do that and then group on simple equality of the canonical values. For instance case-insensitive grouping is customarily done with

GROUP BY lower(x)

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bret Stern 2014-02-06 16:25:04 Re: Help with connection issue - started today
Previous Message Sergey Konoplev 2014-02-06 16:11:19 Re: Postgresql GROUP BY "SIMILAR" but not equal values