From: | Dag Lem <dag(at)nimrod(dot)no> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: daitch_mokotoff module |
Date: | 2022-12-23 21:44:26 |
Message-ID: | ygezgbdvlqd.fsf@sid.nimrod.no |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> I wonder why do you have it return the multiple alternative codes as a
> space-separated string. Maybe an array would be more appropriate. Even
> on your documented example use, the first thing you do is split it on
> spaces.
In the example, the *input* is split on whitespace, the returned soundex
codes are not. The splitting of the input is done in order to code each
word separately. One of the stated rules of the Daitch-Mokotoff Soundex
Coding is that "When a name consists of more than one word, it is coded
as if one word", and this may not always be desired. See
https://www.avotaynu.com/soundex.htm or
https://www.jewishgen.org/InfoFiles/soundex.html for the rules.
The intended use for the Daitch-Mokotoff soundex, as for any other
soundex algorithm, is to index names (or words) on some representation
of sound, so that alike sounding names with different spellings will
match.
In PostgreSQL, the Daitch-Mokotoff Soundex and Full Text Search makes
for a powerful combination to match alike sounding names. Full Text
Search (as any other free text search engine) works with documents, and
thus the Daitch-Mokotoff Soundex implementation produces documents
(words separated by space). As stated in the documentation: "Any
alternative soundex codes are separated by space, which makes the
returned text suited for use in Full Text Search".
Best regards,
Dag Lem
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2022-12-23 21:55:38 | Re: fixing CREATEROLE |
Previous Message | Tom Lane | 2022-12-23 21:38:09 | Re: Error-safe user functions |