From: | "Joe Conway" <joseph(dot)conway(at)home(dot)com> |
---|---|
To: | "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: Fuzzy matching? |
Date: | 2001-07-31 17:07:05 |
Message-ID: | 02ee01c119e3$3a128f50$48d210ac@jecw2k1 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches pgsql-sql |
> And the fuzzy_match should return True if the two phrases are no more
> than that number of characters different. Thus, we should get:
>
> fuzzy_match('Thornton','Tornton',1) = TRUE
> fuzzy_match('Thornton','Torntin',1) = FALSE
> fuzzy_match('Thornton','Torntin',2) = TRUE
>
> Unfortunately, I cannot think of a way to make this happen in a function
> without cycling through all the possible permutations of characters for
> both words or doing some character-by-character comparison with
> elaborate logic for placement. Either of these approaches would be very
> slow, and completely unsuitable for column comparisons on large tables.
>
> Can anyone suggest some shortcuts here? Perhaps using pl/perl or
> something similar?
Sounds like you want something along the lines of soundex or metaphone? I
don't see either function in PostgreSQL, but take a look at the PHP manual
to see examples: http://www.php.net/manual/en/function.soundex.php ,
http://www.php.net/manual/en/function.metaphone.php
I looked at the soundex function in the PHP source, and it looks like it
would be fairly easy to port to a Postgres C function. The algorithm itself
comes from Donald Knuth in "The Art Of Computer Programming, vol. 3: Sorting
And Searching", Addison-Wesley (1973), pp. 391-392.
HTH,
-- Joe
From | Date | Subject | |
---|---|---|---|
Next Message | Robby Slaughter | 2001-07-31 17:07:48 | RE: Fuzzy matching? |
Previous Message | Tom Lane | 2001-07-31 16:16:58 | Re: Patch to add insert of multiple tuples per INSERT statement |
From | Date | Subject | |
---|---|---|---|
Next Message | Robby Slaughter | 2001-07-31 17:07:48 | RE: Fuzzy matching? |
Previous Message | Josh Berkus | 2001-07-31 16:05:28 | Fuzzy matching? |