Re: Fuzzy string matching of product names

From: George Silva <georger(dot)silva(at)gmail(dot)com>
To: Brian Modra <brian(at)zwartberg(dot)com>
Cc: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>, PGSQL Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Fuzzy string matching of product names
Date: 2010-04-05 20:00:15
Message-ID: n2i9aa147371004051300rda5f89aax9d8e8aa28fd44c0c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

The above is true. For geocoding the same idea is used: the metaphone
function is used against street names, and searched to a simples column,
filled with the results of the metaphone function. It works quite well.

George

On Mon, Apr 5, 2010 at 4:23 PM, Brian Modra <brian(at)zwartberg(dot)com> wrote:

> On 05/04/2010, Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com> wrote:
> > Hello,
> >
> > At the moment, users of my application, which runs on 8.4.3, may
> > search for products in a way that is implemented roughly like this:
> >
> > SELECT * FROM products WHERE description ILIKE '%%usr_string%%';
> >
> > This works reasonably well. However, I thought it would be a nice
> > touch to give my users leeway to spell product names incorrectly when
> > searching, or to not have to remember if a product is entered as "coca
> > cola", "CocaCola" or "Coca-cola". At the moment, they don't have to
> > worry about case sensitivity because I use ILIKE - I'd like to
> > preserve that. I'd also like to not have it weigh against them heavily
> > when they don't search for a specific product, but just a common
> > substring. For example, if they search for "coca-cola", there may be a
> > number of different coca-cola products: "CocaCola 330ml can",
> > "Coca-Cola 2 litre bottle", but no actual plain "cocacola". That ought
> > to not matter too much - all cocacola products should be returned.
> >
> > This isn't important enough for me to be willing to add a big
> > dependency to my application. I'd really prefer to limit myself to the
> > contrib modules. pg_trgm and fuzzystrmatch look very promising, but
> > it's not obvious how I can use either to achieve what I want.
> > Postgres's built-in regex support may have a role to play too.
> >
> > I can live with it not being indexable, because typically there are
> > only tens of thousands of products in a production system.
> >
> > Could someone suggest an approach that is reasonably simple and
> > reasonably generic ?
>
> What I do is to create another column that has a simplified version of
> the string in it.
> (I created a function to simplify strings, and when the source column
> is changed or inserted, I also update the "simplified" column.
> Then when searching, I use the same function to "simplify" the search
> string and use "=" to test against the "simplified" column.
>
> E.g.
> if the table has a column called "name" that you want to search, you
> create a name_simplified column, and fill it as so:
> update your_table set name_simplified=yourSimplifyFunction(name);
>
> Then to search:
> select * from your_table where simplified_name =
> yourSimplifyFunction('Coca-Cola');
>
> This is really fast, because the match is using the index rather than
> a sequential scan.
>
> >
> > Thanks,
> > Peter Geoghegan
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
> >
>
>
> --
> Brian Modra Land line: +27 23 5411 462
> Mobile: +27 79 69 77 082
> 5 Jan Louw Str, Prince Albert, 6930
> Postal: P.O. Box 2, Prince Albert 6930
> South Africa
> http://www.zwartberg.com/
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
George R. C. Silva

Desenvolvimento em GIS
http://blog.geoprocessamento.net

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Geoghegan 2010-04-05 20:00:41 Re: Fuzzy string matching of product names
Previous Message Garry Saddington 2010-04-05 19:50:52 Re: windows 7 compatiblity?