Re: Encoding, Unicode, locales, etc.

From: Carlos Moreno <moreno_pg(at)mochima(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Encoding, Unicode, locales, etc.
Date: 2006-11-01 14:50:01
Message-ID: 4548B419.1090205@mochima.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Thanks Tom, for your reply.

Tom Lane wrote:

>Carlos Moreno <moreno_pg(at)mochima(dot)com> writes:
>
>
>>Why is it that the database
>>cluster is resrticted to a single locale (or single set of locales) instead
>>of being configurable on a per-database basis?
>>
>>
>
>Because we depend on libc's locale support, which (on many platforms)
>isn't designed to switch between locales cheaply [...]
>
>This stuff is certainly far from ideal, but the amount of work involved
>to fix it is daunting; see many past pg-hackers discussions.
>
>

Fair enough --- and good to know.

>>2) On the same token (more or less), I have a test database, for which
>>I ran initdb without specifying encoding or locale; then, I create a
>>database with UTF8 encoding.
>>
>>
>
>There's no such thing as "you didn't specify a locale". If you didn't
>specify one on the initdb command line, then it was taken from the
>environment. Try "show lc_collate" and "show lc_ctype" to see what
>got used.
>
>

Yes, that's what I meant --- I meant that I did not use the --locale or
-E command-
line switches for the initdb command. Both lc_ctype and lc_collate show
en_US.UTF-8

>>I try lower of a string that
>>contains characters with accents (e.g., Spanish or French characters),
>>and it works as it should according to Spanish or French rules --- it
>>returns a string with the same characters in lowecase, with the same
>>accent. Why did that work? My Linux machine has all en_US.UTF-8
>>locales, and en_US is not even aware of characters with accents,
>>
>>
>
>You sure? I'd sort of expect a UTF8 locale to know this stuff anyway.
>In any case, Postgres doesn't know anything about case conversion
>beyond what toupper/tolower tell it, so your experimental result is
>sufficient proof that that locale includes these conversions.
>
>

Are you sure there's nothing about the way PostgreSQL interacts with C
conversion functions? I ask because, as part of a "sanity check", I
repeated
the tests --- now with two machines; one that has PG 8.1.4, and the
other one
has 7.4.14, and they behave differently.

The one that does the case conversion "correctly" (read: as I expect it
as per
Spanish or French rules) is 8.1.4 with en_US locale (LC_CTYPE and
LC_COLLATE both showing en_US.UTF-8). PG 7.4.14, *even with
locale es_ES*, does not do the case conversion (characters with accent
or tilde are left untouched).

I wonder if someone could shed some light on this little mystery....???
Perhaps to add more confusion to my experimental/informal tests, PG 8.1.4
is running on a FC4 AMD64 X2 box (the command "locale" at the shell
prompt shows all en_US.utf8), and PG 7.4.14 is running on a laptop with
FC5 on an Intel Celeron M (the command locale shows exactly the same
in that case). Does this perhaps account for the difference?

Thanks,

Carlos
--

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2006-11-01 14:51:33 Re: Pgsql on Solaris
Previous Message Teodor Sigaev 2006-11-01 13:26:36 Re: [HACKERS] Index greater than 8k