Re: Unicode database on non-unicode operating system

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Cc: "Morten Barklund" <morten(dot)barklund(at)tbwa(dot)dk>
Subject: Re: Unicode database on non-unicode operating system
Date: 2008-07-15 12:32:55
Message-ID: 200807151432.57828.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Am Dienstag, 15. Juli 2008 schrieb Morten Barklund:
> My problem is, that the lowercase versions of non-ascii characters are
> broken. Specifically I found, that when lower() is invoked on a text with
> non-ascii characters, the operating system's locale is used for converting
> each octet in the string to lowercase in stead of using the locale of the
> database to convert each character in the string to lowercase. This caused
> the danish lower case o with slash "ø", which in unicode is represented as
> the latin1-readable octets "ø", to be converted to the latin1-readable
> octets "ã¸", which then in turn was (tried) to be interpreted as a unicode
> character - but the octects "ã¸" does not represent a unicode character in
> utf8. The lower case version of "ø" is of course just itself.

This means you have mismatching server encodings and locales configured.
Check SHOW lc_collate and SHOW server_encoding, and then pick a combination
that is compatible. This will probably mean you have to reinitdb.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2008-07-15 13:19:41 Re: Referential integrity vulnerability in 8.3.3
Previous Message Sergey Konoplev 2008-07-15 12:28:06 Re: Referential integrity vulnerability in 8.3.3