From: | Michael Fuhr <mike(at)fuhr(dot)org> |
---|---|
To: | Jiří Němec <konference(at)menea(dot)cz>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Remove diacritical marks in SQL |
Date: | 2007-01-14 16:27:21 |
Message-ID: | 20070114162721.GA3943@winnie.fuhr.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, Jan 12, 2007 at 10:58:36PM +0100, Martijn van Oosterhout wrote:
> On Fri, Jan 12, 2007 at 10:16:22PM +0100, Jiří Němec wrote:
> > I would like to remove diacritical marks from a string in a SQL query.
> > I tried to convert a UTF8 string to ASCII but it doesn't work for me.
> >
> > SELECT convert('ěščřžýáíé','UTF8','SQL_ASCII')
>
> I don't think postgres has any stuff builtin for that, but other
> languages (like perl) have modules to do this kind of thing. The method
> is to decompose the string to normal form D, strip the diacritics, and
> recompose what's left.
A technique that's been posted before might work:
SELECT to_ascii(convert('ěščřžýáíé', 'LATIN2'), 'LATIN2');
to_ascii
-----------
escrzyaie
(1 row)
to_ascii() supports only LATIN1, LATIN2, LATIN9, and WIN1250 so you
have to convert to one of those encodings first.
As Martijn suggested, you could use Perl. Here's an example with
Text::Unaccent (you'll need to use encoding names that iconv
recognizes):
CREATE FUNCTION unaccent(charset text, string text) RETURNS text AS $$
use Text::Unaccent;
return unac_string($_[0], $_[1]);
$$ LANGUAGE plperlu IMMUTABLE STRICT;
SELECT unaccent('UTF-8', 'ěščřžýáíé');
unaccent
-----------
escrzyaie
(1 row)
Here's an example that uses Unicode::Normalize to strip non-spacing
and enclosing marks:
CREATE FUNCTION unaccent(string text) RETURNS text AS $$
use Unicode::Normalize;
my $nfd_string = NFD($_[0]);
$nfd_string =~ s/[\p{Mn}\p{Me}]//g;
return NFC($nfd_string);
$$ LANGUAGE plperlu IMMUTABLE STRICT;
SELECT unaccent('ěščřžýáíé');
unaccent
-----------
escrzyaie
(1 row)
--
Michael Fuhr
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Fuhr | 2007-01-14 16:55:43 | Re: Export to shape file |
Previous Message | Juan Jose Comellas | 2007-01-14 15:33:34 | Re: XEON familiy 5000, 5100 or 5300? |