Quick Links

Re: Yet another problem with ILIKE and UTF-8

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Gergely Bor" <borg42(at)gmail(dot)com>
Cc:	<pgsql-bugs(at)postgresql(dot)org>
Subject:	Re: Yet another problem with ILIKE and UTF-8
Date:	2007-10-25 16:33:12
Message-ID:	87hckftarb.fsf@oxford.xeocode.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

"Gergely Bor" <borg42(at)gmail(dot)com> writes:

> We'll google the initdb stuff and try it ASAP.
>
> What I've tried is LOWER and UPPER, and they seem to return trash for
> Hungarian UTF-8 characters, but they handle ASCII well. (Hmmmm...
> maybe ILIKE requires LOWER and UPPER to work? Would not be
> illogical...)

It does. I think it works by just downcasing both strings. It's possible to do
better but tricky. I think 8.3 has an optimization for that for single-byte
encodings but it had to be disabled for utf-8 in the end.

If it's returning trash for those characters then it's not prepared to handle
UTF-8 data. You have to use an encoding compatible with your locale and
vice-versa.

If you want to store UTF-8 data I suggest you

. add hu_HU.UTF-8 to /etc/locale.gen,
. rerun /usr/sbin/locale-gen
. pg_dump your database
. re-initdb with the locale set to hu_HU.UTF-8
. pg_restore your data.

Unfortunately that'll take quite a while and involve down-time.

You should probably do this in a second directory aside from your existing
database just in case you've created any invalidly encoded utf-8 strings.
You'll have to fix them before restoring. (Actually I don't recall which
version got strict about that.)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Re: Yet another problem with ILIKE and UTF-8 at 2007-10-25 15:16:04 from Gergely Bor

Responses

Re: Yet another problem with ILIKE and UTF-8 at 2007-10-26 14:17:43 from Gergely Bor

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Tom Lane	2007-10-25 16:35:08	Re: Yet another problem with ILIKE and UTF-8
Previous Message	Tom Lane	2007-10-25 16:29:17	Re: BUG #3696: FK integrity check bypassed using rules.