Re: Accent insensitive search?

From: Stuart Bishop <stuart(at)stuartbishop(dot)net>
To: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Accent insensitive search?
Date: 2009-03-30 09:02:50
Message-ID: 6bc73d4c0903300202r7495c46eo97697842d2221f1d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Mar 24, 2009 at 4:53 PM, Jasen Betts <jasen(at)xnet(dot)co(dot)nz> wrote:
> On 2009-03-18, cifroes <cifroes(at)netcabo(dot)pt> wrote:
>> This is a multi-part message in MIME format.
>>
>> ------_=_NextPart_001_01C9A7E6.B32BBA87
>> Content-Type: text/plain;
>>       charset="iso-8859-1"
>> Content-Transfer-Encoding: quoted-printable
>>
>> Hi,
>>
>> I have a DB in utf-8 and postgres 8.3.x.=20
>>
>> How can I do an accent insensitive search (like ...) ?
>
> use a posix regular expression that matches the string you want:
>
> select * from foo where colname ~ '^[CcĆćĈĉĊċČč][ĀāĂ㥹Aa][Pp][ŌōŎŏŐő]$';
>
> you could write a function to do the translation.

Unicode normalization is an interesting way to strip accents I've found:

>>> import unicodedata
>>>
>>> s = u'Björn'
>>> s = unicodedata.normalize("NFKD", s)
>>> s = ''.join(c for c in s if ord(c) < 127)
>>> print s
Bjorn

You can also use the character names to map many more characters to
the ascii equivalent. A large number of these can me smashed into
ASCII using regular expressions and some manual mappings to map LETTER
THORN -> th, LETTER LATERAL CLICK -> X etc. Just mapping CAPITAL
LETTER XX -> XX and SMALL LETTER XX -> xx seems to get you most of
europe if you special case SHARP S -> ss and THORN -> th.

>>> s = u'ァ'
>>> print unicodedata.name(s)
KATAKANA LETTER SMALL A

--
Stuart Bishop <stuart(at)stuartbishop(dot)net>
http://www.stuartbishop.net/

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Cédric Villemain 2009-03-30 10:08:44 debian package 8.4devel was:[Pkg-postgresql-public] Postgres major version support policy on Debian
Previous Message Dave Page 2009-03-30 08:34:31 Re: New shapshot RPMs (Mar 27, 2009) are ready for testing