Quick Links

Unicode and unaccent()

From:	"Mark Borins" <mark(dot)borins(at)rigadev(dot)com>
To:	<pgsql-general(at)postgresql(dot)org>
Subject:	Unicode and unaccent()
Date:	2005-05-05 19:37:35
Message-ID:	111532185801@smtp-1.vancouver.ipapp.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I am trying to write an unaccent function because I need to do some queries
comparing data that has accents and data that does not.

The encoding on my DB is Unicode, so far I have found an unaccent() function
by looking in the mail archives it looks like the following:

CREATE FUNCTION unaccent(text) RETURNS text AS $$
BEGIN
RETURN translate($1, '\342\347\350\351\352\364\373', 'aceeeou')
; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT;

My problem is that the values like \342 are for LATIN1 type encoding. I
have tried and failed to get this working using the what I think is the
Unicode escaping method \u0032 for example.

Even if someone could help me with the Unicode escaping method that would be
useful. For example if I wanted to find a Unicode character 0x00E2 with a
select statement how would I?

Something like select * from table where field like '%\u00e2%';

Doesn't seem to work.

Does anyone have a good method for unaccenting Unicode dbs/characters?

I am using PG7.4 on FC2

Thank you

Responses

Re: Unicode and unaccent() at 2005-05-06 06:12:24 from Peter Eisentraut
Re: Unicode and unaccent() at 2005-05-06 09:15:17 from Daniel Verite

Browse pgsql-general by date

	From	Date	Subject
Next Message	A. Mous	2005-05-05 19:47:43	Re: Postmaster not reporting number of connections correctly
Previous Message	CSN	2005-05-05 19:28:45	Re: Booleans - Why in Postgres and not in Oracle or Mysql?