From: | Ben <bench(at)silentmedia(dot)com> |
---|---|
To: | mrylander(at)gmail(dot)com |
Cc: | Postgresql-General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: to_ascii, or some other form of magic transliteration |
Date: | 2005-09-10 23:48:43 |
Message-ID: | FC637029-2586-42B7-888D-E4F3F519DB98@silentmedia.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hrm, I must be missing something, because I don't see how this will
transliterate to ASCII?
On Sep 10, 2005, at 5:30 AM, Mike Rylander wrote:
> On 9/9/05, Ben <bench(at)silentmedia(dot)com> wrote:
>
>> I'm working on a problem that I imagine others have had, which
>> basically
>> boils down to having nice unicode display text that users are
>> going to
>> want to search against without typing it correctly.... e.g. let a
>> search
>> for "sma" match "små". It seems like the best way to do this is to
>> find
>> a magic unicode transliteration mapping function, and then save the
>> ASCII transliterations for searching against.
>>
>>
>
> The simplest solution to this that I've found is to maintain a
> separate column for ASCII-ized version of your text. The conversion
> can be done automatically using a trigger, and I have one in PL/PERLU
> that I use. It basically boils down to:
>
> 1) transform unicode text to normal form D
> 2) strip combining non-spacing marks
>
> In modern Perls that looks like:
>
> #--------------
> use Unicode::Normalize;
> my $txt = NFD(shift());
> $txt =~ s/\pM//og;
> return $txt;
> #--------------
>
> Hope that helps!
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Fuhr | 2005-09-11 00:00:49 | Re: back references using regex |
Previous Message | Tony Caduto | 2005-09-10 23:01:47 | Re: EMS PostgreSQL Manager vs. TheKompany DataArchitect |