Quick Links

Re: UTF-8 and LIKE vs =

From:	David Wheeler <david(at)kineticode(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Ian Barwick <barwick(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: UTF-8 and LIKE vs =
Date:	2004-08-23 22:50:07
Message-ID:	C83F7824-F556-11D8-990D-000A95972D84@kineticode.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Aug 23, 2004, at 3:44 PM, Tom Lane wrote:

>> Yes, it means that = is doing the wrong thing!!
>
> I have seen this happen in situations where the strings contained
> character sequences that were illegal according to the encoding that
> the
> locale thought was in force. (It seems that strcoll() will return more
> or less random results in such cases...) In particular, given that you
> have
>
>> LC_COLLATE: en_US.UTF-8
>> LC_CTYPE: en_US.UTF-8
>
> you are at risk if the data is not legal UTF-8 strings.

But is it possible to store non-UTF-8 data in a UNICODE database?

> The real question therefore is whether you have the database encoding
> set correctly --- ie, is it UNICODE (== UTF8)? If not then it may well
> be that Postgres is presenting strings to strcoll() that the latter
> will
> choke on.

The database is UNICODE.

I plan to dump it, run initdb with LC_COLLATE and LC_CTYPE both set to
"C", and restore the database and see if that helps.

Thanks,

David

In response to

Re: UTF-8 and LIKE vs = at 2004-08-23 22:44:47 from Tom Lane

Responses

Re: UTF-8 and LIKE vs = at 2004-08-23 22:59:13 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	David Wheeler	2004-08-23 22:58:33	Re: UTF-8 and LIKE vs =
Previous Message	Markus Bertheau	2004-08-23 22:46:50	Re: UTF-8 and LIKE vs =