Re: to_ascii, or some other form of magic transliteration

From: Ben <bench(at)silentmedia(dot)com>
To: mrylander(at)gmail(dot)com
Cc: Postgresql-General <pgsql-general(at)postgresql(dot)org>
Subject: Re: to_ascii, or some other form of magic transliteration
Date: 2005-09-10 23:48:43
Message-ID: FC637029-2586-42B7-888D-E4F3F519DB98@silentmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hrm, I must be missing something, because I don't see how this will
transliterate to ASCII?

On Sep 10, 2005, at 5:30 AM, Mike Rylander wrote:

> On 9/9/05, Ben <bench(at)silentmedia(dot)com> wrote:
>
>> I'm working on a problem that I imagine others have had, which
>> basically
>> boils down to having nice unicode display text that users are
>> going to
>> want to search against without typing it correctly.... e.g. let a
>> search
>> for "sma" match "små". It seems like the best way to do this is to
>> find
>> a magic unicode transliteration mapping function, and then save the
>> ASCII transliterations for searching against.
>>
>>
>
> The simplest solution to this that I've found is to maintain a
> separate column for ASCII-ized version of your text. The conversion
> can be done automatically using a trigger, and I have one in PL/PERLU
> that I use. It basically boils down to:
>
> 1) transform unicode text to normal form D
> 2) strip combining non-spacing marks
>
> In modern Perls that looks like:
>
> #--------------
> use Unicode::Normalize;
> my $txt = NFD(shift());
> $txt =~ s/\pM//og;
> return $txt;
> #--------------
>
> Hope that helps!
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Fuhr 2005-09-11 00:00:49 Re: back references using regex
Previous Message Tony Caduto 2005-09-10 23:01:47 Re: EMS PostgreSQL Manager vs. TheKompany DataArchitect