From: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
---|---|
To: | SunWuKung <Balazs(dot)Klein(at)axelero(dot)hu> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: case insensitive match in unicode |
Date: | 2006-03-27 11:40:37 |
Message-ID: | 20060327114037.GD30791@svana.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, Mar 27, 2006 at 12:45:05PM +0200, SunWuKung wrote:
> This sounds like a very interesting concept.
> It wouldn't be 'case insensitive' just insensitive.
>
> The way I imagine it now is a special case of the ~ function.
> I create matchgroups in a table and check each character if it is in the
> group. If it is I will replace the character with the group in [éÉE],
> [oóOÓ??] and do a regexp with that.
No need to reinvent the wheel. ICU provides a range of services to deal
with this. For example the following filter in ICU:
NFD; [:Nonspacing Mark:] Remove; NFC.
Will remove all accents from characters. And it works for all Unicode
characters. With a bit more thinking you can work with case variations
also.
There is also a locale-independant case-mapping module there plus
various locale specific ones also.
http://icu.sourceforge.net/userguide/Transform.html
http://icu.sourceforge.net/userguide/caseMappings.html
http://icu.sourceforge.net/userguide/normalization.html
Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.
From | Date | Subject | |
---|---|---|---|
Next Message | Martijn van Oosterhout | 2006-03-27 11:42:25 | Re: Converting a database from LATIN1 to UTF-8 |
Previous Message | Tormod Omholt-Jensen | 2006-03-27 11:32:33 | Converting a database from LATIN1 to UTF-8 |