From: | Ian Barwick <barwick(at)gmail(dot)com> |
---|---|
To: | David Wheeler <david(at)kineticode(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: UTF-8 and LIKE vs = |
Date: | 2004-08-23 21:25:05 |
Message-ID: | 1d581afe0408231425623a7261@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, 23 Aug 2004 14:04:05 -0700, David Wheeler <david(at)kineticode(dot)com> wrote:
> On Aug 23, 2004, at 1:58 PM, Ian Barwick wrote:
>
> > er, the characters in "name" don't seem to match the characters in the
> > query - '국방비' vs. '북한의' - does that have any bearing?
>
> Yes, it means that = is doing the wrong thing!!
>
> I noticed this because I had a query that was looking in the keyword
> table for an existing record using LIKE. If it didn't find it, it
> inserted it. But the inserts were giving me an error because the name
> column has a UNIQUE index on it. Could it be that the index and the =
> operator are comparing bytes, and that '국방비' and '북한의' have the same
> bytes but different characters??
>
> If so, this is a pretty serious problem. How can I get = and the
> indices to use character semantics rather than byte semantics? I also
> need to be able to store data in different languages in the database
> (and in the same column!), but all in Unicode.
I don't know what the problem is, but you might want to check the
client encoding settings, and the encoding your characters are
arriving in (remembering all the time, in Postgres "UNICODE" really
means UTF-8).
If you're using Perl (I'm guessing this is Bricolage-related) the
"_utf8_on"-ness of strings might be worth checking too, and also the
"pg_enable_utf8" flag in DBD::Pg.
Ian Barwick
barwick(at)gmail(dot)net
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2004-08-23 21:25:53 | Re: Unsupported 3rd-party solutions (Was: Few questions |
Previous Message | Carlos Moreno | 2004-08-23 21:06:15 | Deadlocks -- what can I do about them? |