Re: setlocale() on Windows is broken

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Subject: Re: setlocale() on Windows is broken
Date: 2011-09-01 08:36:49
Message-ID: 4E5F4421.5030704@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31.08.2011 16:05, Heikki Linnakangas wrote:
> While looking through old emails, I bumped into this:
>
> http://archives.postgresql.org/message-id/25219.1303306707@sss.pgh.pa.us
>
> To recap, setlocale() on Windows is broken for locale names that contain
> dots or apostrophes in the country name. That includes "Hong Kong
> S.A.R.", "Macau S.A.R.", and "U.A.E." and "People's Republic of China".
>
> In April, I put in a hack to initdb to map those problematic names to
> aliases that don't contain dots:
>
> People's Republic of China -> China
> Hong Kong S.A.R. -> HKG
> U.A.E. -> ARE
> Macau S.A.R. -> ZHM
>
> However, Hiroshi pointed out in the thread linked above that that
> doesn't completely solve the problem. If you set locale to "HKG", for
> example, setlocale(LC_ALL, NULL) still returns the full name, "Hong Kong
> S.A.R.", and if you feed that back to setlocale() it fails. In
> particular, check_locale() uses "saved = setlocale(LC_XXX, NULL)" to get
> the current value, and tries to restore it later with "setlocale(LC_XXX,
> saved)".
>
>
> At first, I thought I should revert my hack in initdb, since it's not
> fully solving the problem anyway. But it doesn't really help - you run
> into the same issue if you set locale to one of those aliases manually.
> And that's exactly what users will have to do if we don't map those
> locales automatically.
>
> Microsoft should fix their bug. I don't have much faith in that
> happening, however. So, I think we should move the mapping from initdb
> to somewhere in src/port, so that the mapping is done every time
> setlocale() is called. That would fix the problem with check_locale():
> even though "setlocale(LC_XXX, NULL)" returns a value that won't work,
> the setlocale() call to restore it would map it to an alias that does
> work again.
>
> In addition to that, I think we should check the return value of
> setlocale() in check_locale(), and throw a warning if restoring the old
> locale fails. The session's locale will still be screwed, but at least
> you'll know if it happens.

I've committed a patch along those lines.

It turned out to be pretty difficult to reproduce user-visible buggy
behavior caused by this bug, so for the sake of the archives, here's a
recipe on that:

1. Set system locale to "Chinese_Hong Kong S.A.R..950"

2. initdb -D data --locale="Arabic_ARE"

3. Launch psql.

CREATE TABLE foo (a text);
INSERT INTO foo VALUES ('a'), ('A');

-- Verify that the order is 'a', 'A'
SELECT * FROM foo ORDER BY a;

-- This fails, as it should
CREATE DATABASE postgres WITH LC_COLLATE='C' TEMPLATE=template0;

-- This also fails, as it should
CREATE DATABASE postgres WITH LC_COLLATE='C' TEMPLATE=template0;

-- The order returned by this is now wrong: 'A', 'a'
SELECT * FROM foo ORDER BY a;

It's a bizarre looking sequence, but that does it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-09-01 08:59:11 Re: WIP: Fast GiST index build
Previous Message Dimitri Fontaine 2011-09-01 08:14:45 Re: postgresql.conf archive_command example