From: | Gregory Stark <stark(at)enterprisedb(dot)com> |
---|---|
To: | <rihad(at)mail(dot)ru> |
Cc: | <pgsql-general(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Question for Postgres 8.3 |
Date: | 2008-02-05 08:04:16 |
Message-ID: | 87hcgn4ynz.fsf@oxford.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
"rihad" <rihad(at)mail(dot)ru> writes:
>> If you want to support multiple encodings, the only safe locale choice
>> is (and always has been) C.
>
> I should be ashamed for asking this, but would someone care to tell me how
> encoding differs from locale?
One you missed is a character set, which is just a set of possible characters
(not bytes, abstract things called characters).
An encoding is a mapping from a series of binary bytes to a series of
characters from a character set, like UTF-8 or Big5 or just plain ascii.
A locale is a set of rules for how to sort (collation), format dates, numbers,
currencies, etc like es_US or jp_JP
The problem is that a locale needs to know what the string it's looking is at
to decide how to sort it, so it has to be designed for a particular encoding.
In Unix that encoding is tacked on the end like en_US.UTF-8.
C is a bit of special case since it sorts based on the binary representation
rather than the characters. That's true for any 1-byte encoding based locale
but C is more predictable when you actually have binary data.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-02-05 08:18:53 | Re: Upgrading from 8.3RC2 to release |
Previous Message | Hermann Muster | 2008-02-05 07:52:52 | Re: msvcr80.dll and PostgreSQL 8.3 under Windows XP |