Re: to_ascii, or some other form of magic transliteration

From: Mike Rylander <mrylander(at)gmail(dot)com>
To: Ben <bench(at)silentmedia(dot)com>, Postgresql-General <pgsql-general(at)postgresql(dot)org>
Subject: Re: to_ascii, or some other form of magic transliteration
Date: 2005-09-10 12:30:28
Message-ID: b918cf3d050910053029faae73@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 9/9/05, Ben <bench(at)silentmedia(dot)com> wrote:
> I'm working on a problem that I imagine others have had, which basically
> boils down to having nice unicode display text that users are going to
> want to search against without typing it correctly.... e.g. let a search
> for "sma" match "små". It seems like the best way to do this is to find
> a magic unicode transliteration mapping function, and then save the
> ASCII transliterations for searching against.
>

The simplest solution to this that I've found is to maintain a
separate column for ASCII-ized version of your text. The conversion
can be done automatically using a trigger, and I have one in PL/PERLU
that I use. It basically boils down to:

1) transform unicode text to normal form D
2) strip combining non-spacing marks

In modern Perls that looks like:

#--------------
use Unicode::Normalize;
my $txt = NFD(shift());
$txt =~ s/\pM//og;
return $txt;
#--------------

Hope that helps!

> I see there's a function to_ascii, which sounds hopeful. However, when I
> try to use it, I get back:
>
> ERROR: encoding conversion from UNICODE to ASCII not supported
>
> What is this function for, if not to convert other encodings to ASCII?
> Is there some other way to do what I'm asking for?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

--
Mike Rylander
mrylander(at)gmail(dot)com
GPLS -- PINES Development
Database Developer
http://open-ils.org

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Fuhr 2005-09-10 13:23:38 Re: back references using regex
Previous Message Douglas McNaught 2005-09-10 12:21:53 Re: back references using regex