From: | Christopher Kings-Lynne <chris(dot)kings-lynne(at)calorieking(dot)com> |
---|---|
To: | Mark Woodward <pgsql(at)mohawksoft(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: String Similarity |
Date: | 2006-05-22 03:15:42 |
Message-ID: | 44712CDE.1090608@calorieking.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Try contrib/pg_trgm...
Chris
Mark Woodward wrote:
> I have a side project that needs to "intelligently" know if two strings
> are contextually similar. Think about how CDDB information is collected
> and sorted. It isn't perfect, but there should be enough information to be
> usable.
>
> Think about this:
>
> "pink floyd - dark side of the moon - money"
> "dark side of the moon - pink floyd - money"
> "money - dark side of the moon - pink floyd"
> etc.
>
> To a human, these strings are almost identical. Similarly:
>
> "dark floyd of money moon pink side the"
>
> Is a puzzle to be solved by 13 year old children before the movie starts.
>
> My post has three questions:
>
> (1) Does anyone know of an efficient and numerically quantified method of
> detecting these sorts of things? I currently have a fairly inefficient and
> numerically bogus solution that may be the only non-impossible solution
> for the problem.
>
> (2) Does any one see a need for this feature in PostgreSQL? If so, what
> kind of interface would be best accepted as a patch? I am currently
> returning a match liklihood between 0 and 100;
>
> (3) Is there also a desire for a Levenshtein distence function for text
> and varchars? I experimented with it, and was forced to write the function
> in item #1.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
--
Christopher Kings-Lynne
Technical Manager
CalorieKing
Tel: +618.9389.8777
Fax: +618.9389.8444
chris(dot)kings-lynne(at)calorieking(dot)com
www.calorieking.com
From | Date | Subject | |
---|---|---|---|
Next Message | Martijn van Oosterhout | 2006-05-22 06:54:30 | Re: problem with PQsendQuery/PQgetResult and COPY FROM statement |
Previous Message | Tom Lane | 2006-05-21 22:43:40 | Re: FW: iDefense Q2 2006 Vulnerability Challenge |