From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Joe Conway <mail(at)joeconway(dot)com> |
Cc: | jim(at)nasby(dot)net, "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org> |
Subject: | Re: [GENERAL] Bug in metaphone (contrib/fuzzystrmatch) |
Date: | 2003-06-23 03:56:38 |
Message-ID: | 200306230356.h5N3ucV22076@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-patches |
Joe Conway wrote:
> (I never saw this make it to the list yesterday, so I'm resending to
> patches)
>
> Jim C. Nasby wrote:
> > Second argument to metaphone is suposed to set the limit on the
> > number of characters to return, but it breaks on some phrases:
> >
> > usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> > (select 'Hello world'::varchar AS a) a;
> > HLW | HLWR | HLWRLT
> >
> > usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> > (select 'A A COMEAUX MEMORIAL'::varchar AS a) a;
> > AKM | AKMKS | AKMKSMMRL
> >
> > In every case I've found that does this, the 4th and 5th letters are
> > always 'KS'.
>
> Nice catch.
>
> There was a bug in the original metaphone algorithm from CPAN. Patch
> attached (while I was at it I updated my email address, changed the
> copyright to PGDG, and removed an unnecessary palloc). Here's how it
> looks now:
>
> regression=# select metaphone(a,4) from (select 'A A COMEAUX
> MEMORIAL'::varchar AS a) a;
> metaphone
> -----------
> AKMK
> (1 row)
>
> regression=# select metaphone(a,5) from (select 'A A COMEAUX
> MEMORIAL'::varchar AS a) a;
> metaphone
> -----------
> AKMKS
> (1 row)
>
> Please apply.
>
> Thanks,
>
> Joe
>
> Index: contrib/fuzzystrmatch/README.fuzzystrmatch
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/README.fuzzystrmatch,v
> retrieving revision 1.2
> diff -c -r1.2 README.fuzzystrmatch
> *** contrib/fuzzystrmatch/README.fuzzystrmatch 7 Aug 2001 18:16:01 -0000 1.2
> --- contrib/fuzzystrmatch/README.fuzzystrmatch 6 Jun 2003 16:37:54 -0000
> ***************
> *** 3,9 ****
> *
> * Functions for "fuzzy" comparison of strings
> *
> ! * Copyright (c) Joseph Conway <joseph(dot)conway(at)home(dot)com>, 2001;
> *
> * levenshtein()
> * -------------
> --- 3,12 ----
> *
> * Functions for "fuzzy" comparison of strings
> *
> ! * Joe Conway <mail(at)joeconway(dot)com>
> ! *
> ! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
> ! * ALL RIGHTS RESERVED;
> *
> * levenshtein()
> * -------------
> Index: contrib/fuzzystrmatch/fuzzystrmatch.c
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/fuzzystrmatch.c,v
> retrieving revision 1.7
> diff -c -r1.7 fuzzystrmatch.c
> *** contrib/fuzzystrmatch/fuzzystrmatch.c 10 Mar 2003 22:28:17 -0000 1.7
> --- contrib/fuzzystrmatch/fuzzystrmatch.c 6 Jun 2003 16:38:06 -0000
> ***************
> *** 3,9 ****
> *
> * Functions for "fuzzy" comparison of strings
> *
> ! * Copyright (c) Joseph Conway <joseph(dot)conway(at)home(dot)com>, 2001;
> *
> * levenshtein()
> * -------------
> --- 3,12 ----
> *
> * Functions for "fuzzy" comparison of strings
> *
> ! * Joe Conway <mail(at)joeconway(dot)com>
> ! *
> ! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
> ! * ALL RIGHTS RESERVED;
> *
> * levenshtein()
> * -------------
> ***************
> *** 221,229 ****
> if (!(reqlen > 0))
> elog(ERROR, "metaphone: Requested Metaphone output length must be > 0");
>
> - metaph = palloc(reqlen);
> - memset(metaph, '\0', reqlen);
> -
> retval = _metaphone(str_i, reqlen, &metaph);
> if (retval == META_SUCCESS)
> {
> --- 224,229 ----
> ***************
> *** 629,635 ****
> /* KS */
> case 'X':
> Phonize('K');
> ! Phonize('S');
> break;
> /* Y if followed by a vowel */
> case 'Y':
> --- 629,636 ----
> /* KS */
> case 'X':
> Phonize('K');
> ! if (max_phonemes == 0 || Phone_Len < max_phonemes)
> ! Phonize('S');
> break;
> /* Y if followed by a vowel */
> case 'Y':
> Index: contrib/fuzzystrmatch/fuzzystrmatch.h
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/fuzzystrmatch.h,v
> retrieving revision 1.6
> diff -c -r1.6 fuzzystrmatch.h
> *** contrib/fuzzystrmatch/fuzzystrmatch.h 5 Sep 2002 00:43:06 -0000 1.6
> --- contrib/fuzzystrmatch/fuzzystrmatch.h 6 Jun 2003 16:38:13 -0000
> ***************
> *** 3,9 ****
> *
> * Functions for "fuzzy" comparison of strings
> *
> ! * Copyright (c) Joseph Conway <joseph(dot)conway(at)home(dot)com>, 2001;
> *
> * levenshtein()
> * -------------
> --- 3,12 ----
> *
> * Functions for "fuzzy" comparison of strings
> *
> ! * Joe Conway <mail(at)joeconway(dot)com>
> ! *
> ! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
> ! * ALL RIGHTS RESERVED;
> *
> * levenshtein()
> * -------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Attachment | Content-Type | Size |
---|---|---|
unknown_filename | text/plain | 251 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2003-06-23 03:57:10 | Re: [GENERAL] interesting PHP/MySQL thread |
Previous Message | nolan | 2003-06-23 03:55:00 | Re: [GENERAL] interesting PHP/MySQL thread |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2003-06-23 04:33:53 | Re: Runtime.SGML diff ... please expedite! |
Previous Message | Bruce Momjian | 2003-06-23 03:42:21 | Re: CIDR addresses in pg_hba.conf |