Quick Links

Re: UTF-8 and LIKE vs =

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	David Wheeler <david(at)kineticode(dot)com>
Cc:	Ian Barwick <barwick(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: UTF-8 and LIKE vs =
Date:	2004-08-23 22:44:47
Message-ID:	20743.1093301087@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

David Wheeler <david(at)kineticode(dot)com> writes:
> On Aug 23, 2004, at 1:58 PM, Ian Barwick wrote:
>> er, the characters in "name" don't seem to match the characters in the
>> query - '=B1=B9=B9=E6=BA=F1' vs. '=BA=CF=C7=D1=C0=C7' - does that have an=
> y bearing?

> Yes, it means that = is doing the wrong thing!!

I have seen this happen in situations where the strings contained
character sequences that were illegal according to the encoding that the
locale thought was in force. (It seems that strcoll() will return more
or less random results in such cases...) In particular, given that you
have

> LC_COLLATE: en_US.UTF-8
> LC_CTYPE: en_US.UTF-8

you are at risk if the data is not legal UTF-8 strings.

The real question therefore is whether you have the database encoding
set correctly --- ie, is it UNICODE (== UTF8)? If not then it may well
be that Postgres is presenting strings to strcoll() that the latter will
choke on.

regards, tom lane

In response to

Re: UTF-8 and LIKE vs = at 2004-08-23 21:04:05 from David Wheeler

Responses

Re: UTF-8 and LIKE vs = at 2004-08-23 22:50:07 from David Wheeler

Browse pgsql-general by date

	From	Date	Subject
Next Message	Markus Bertheau	2004-08-23 22:46:50	Re: UTF-8 and LIKE vs =
Previous Message	Vidyasagara Guntaka	2004-08-23 22:11:02	Problems with building libpq for windows