Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "Haifang Wang (Centific Technologies Inc)" <v-haiwang(at)microsoft(dot)com>, Sandeep Thakkar <sandeep(dot)thakkar(at)enterprisedb(dot)com>, Rahul Pandey <pandeyrah(at)microsoft(dot)com>, Vishwa Deepak <Vishwa(dot)Deepak(at)microsoft(dot)com>, Shawn Steele <Shawn(dot)Steele(at)microsoft(dot)com>, Amy Wishnousky <amyw(at)microsoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Shweta Gulati <gulatishweta(at)microsoft(dot)com>, Ashish Nawal <nawalashish(at)microsoft(dot)com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Date: 2024-08-30 03:22:13
Message-ID: CA+hUKG+28zRyk8U=eFLn8O+17VrbpKykWPVrSzf57Ru32dQo8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Aug 30, 2024 at 10:55 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Fri, Aug 30, 2024 at 3:16 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> > On 2024-08-28 We 6:16 AM, Thomas Munro wrote:
> > > initdb: error: locale name "liberté" contains non-ASCII characters
> >
> > +1 for doing this.
>
> Thanks for looking! OK, let's go with this approach. That two-line
> demo patch doesn't work properly for the system default, and of course
> it should be in sync with the backend function, and needs comments to
> explain this mess, so I'll work on a serious patch.

Done. To develop this patch, I set up a cursed locale name on my
FreeBSD box like so:

$ sudo cp -r /usr/share/locale/fr_FR.UTF-8 /usr/share/locale/fraternité.UTF-8

Then I could exercise the setlocale("") path, approximating what I
expect a Turkish Windows system to do without explicit --locale:

$ LC_TIME="fraternité.UTF-8" initdb -D pdata
...
initdb: error: locale name "fraternité.UTF-8" contains non-ASCII characters

The explicit path is easier to test, like before:

$ initdb -D pgdata --locale "égalité.UTF-9"
...
initdb: error: locale name "égalité.UTF-9" contains non-ASCII characters

That's what the EDB installer (if not changed to provide ASCII-clean
names somehow or other) will start failing with, if it tries to feed
umlauts to initdb.

Working through that, I understood another problem: even if you
provide an acceptable string like --locale "tr-TR", we still need to
be able to save-and-restore the value that initdb inherited from the
environment. Using my simulation, what I mean is if you do:

$ export LANG="liberté.UTF-8"

... then ...

$ initdb -D pgdata --locale "fr_FR.UTF-8"

... it still needs to be able to do a safe save-and-restore back to
the environment's value without crashing, even though it's not really
using that value. I wondered if initdb really needs a locale other
than "C" for itself, but I'm trying to minimise the changes here so I
don't break something else... That seems to mean that we have to use
_wsetlocale() for that particular save/restore. This is essentially
what Haifang's colleagues were suggesting upthread. Done.

Once a server is up and running, we shouldn't be exposed to the server
environment (right?), but there are several more places that accept
user-supplied strings, and only CREATE COLLATION fails to run
check_locale():

postgres=# create collation foo (locale = 'fraternité.UTF-8');
CREATE COLLATION

The other places I am aware of do, so putting the equivalent code into
check_locale() does the right thing:

postgres=# create database db locale = 'fraternité.UTF-8';
WARNING: locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR: invalid LC_COLLATE locale name: "fraternité.UTF-8"
HINT: If the locale name is specific to ICU, use ICU_LOCALE.

postgres=# set lc_time = 'fraternité.UTF-8';
WARNING: locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR: invalid value for parameter "lc_time": "fraternité.UTF-8"

postgres=# set lc_messages = 'fraternité.UTF-8';
WARNING: locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR: invalid value for parameter "lc_messages": "fraternité.UTF-8"

postgres=# set lc_monetary = 'fraternité.UTF-8';
WARNING: locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR: invalid value for parameter "lc_monetary": "fraternité.UTF-8"

Those last SET lc_* changes aren't strictly necessary, because we
never restore to those, we only switch to them and then restore to the
database default, but since they share code with CREATE DATABASE the
new check kicks in anyway.

I tried adding check_locale() checks into CREATE COLLATION, but that
made some tests fail (huh, I didn't know that the set of strings
acceptable to _newlocale() and setlocale() are different -- the former
seems to accept "POSIX" on that OS, which is exercised by some tests,
or maybe it's some path of ours, I didn't check). But, on reflection,
I don't believe that COLLATION objects actually have this problem
anyway, because we never call setlocale() to them, we just use the
modern thread-safe POSIX-2008-oid APIs. So I took that back out to
minimise the patch.

Please see attached, including a restatement of the problem and
solution in the commit message.

I suppose we could add some tests for this, but I was a bit unsure
about the rules for putting non-ASCII into perl scripts and command
lines, and would need to research that...

(Note: a lot of this old stuff is getting ripped out in master by the
ongoing multi-threading effort...)

Attachment Content-Type Size
v2-0001-Reject-non-ASCII-locale-names.patch text/x-patch 7.3 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2024-08-30 10:13:07 BUG #18596: I can't download PostgreSQL16
Previous Message Thomas Munro 2024-08-29 22:55:29 Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607