RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From: "Haifang Wang (Centific Technologies Inc)" <v-haiwang(at)microsoft(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Sandeep Thakkar <sandeep(dot)thakkar(at)enterprisedb(dot)com>, Rahul Pandey <pandeyrah(at)microsoft(dot)com>, Vishwa Deepak <Vishwa(dot)Deepak(at)microsoft(dot)com>, Shawn Steele <Shawn(dot)Steele(at)microsoft(dot)com>, Amy Wishnousky <amyw(at)microsoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Shweta Gulati <gulatishweta(at)microsoft(dot)com>, Ashish Nawal <nawalashish(at)microsoft(dot)com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Date: 2024-09-04 21:28:22
Message-ID: PH8PR21MB3902991F916D365183F8AE61E59C2@PH8PR21MB3902.namprd21.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Thomas,

How is it going after all the analysis below?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Sent: Thursday, August 29, 2024 8:22 PM
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang(at)microsoft(dot)com>; Sandeep Thakkar <sandeep(dot)thakkar(at)enterprisedb(dot)com>; Rahul Pandey <pandeyrah(at)microsoft(dot)com>; Vishwa Deepak <Vishwa(dot)Deepak(at)microsoft(dot)com>; Shawn Steele <Shawn(dot)Steele(at)microsoft(dot)com>; Amy Wishnousky <amyw(at)microsoft(dot)com>; Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>; pgsql-bugs(at)lists(dot)postgresql(dot)org; Shweta Gulati <gulatishweta(at)microsoft(dot)com>; Ashish Nawal <nawalashish(at)microsoft(dot)com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Fri, Aug 30, 2024 at 10:55 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Fri, Aug 30, 2024 at 3:16 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> > On 2024-08-28 We 6:16 AM, Thomas Munro wrote:
> > > initdb: error: locale name "liberté" contains non-ASCII characters
> >
> > +1 for doing this.
>
> Thanks for looking! OK, let's go with this approach. That two-line
> demo patch doesn't work properly for the system default, and of course
> it should be in sync with the backend function, and needs comments to
> explain this mess, so I'll work on a serious patch.

Done. To develop this patch, I set up a cursed locale name on my FreeBSD box like so:

$ sudo cp -r /usr/share/locale/fr_FR.UTF-8 /usr/share/locale/fraternité.UTF-8

Then I could exercise the setlocale("") path, approximating what I expect a Turkish Windows system to do without explicit --locale:

$ LC_TIME="fraternité.UTF-8" initdb -D pdata ...
initdb: error: locale name "fraternité.UTF-8" contains non-ASCII characters

The explicit path is easier to test, like before:

$ initdb -D pgdata --locale "égalité.UTF-9"
...
initdb: error: locale name "égalité.UTF-9" contains non-ASCII characters

That's what the EDB installer (if not changed to provide ASCII-clean names somehow or other) will start failing with, if it tries to feed umlauts to initdb.

Working through that, I understood another problem: even if you provide an acceptable string like --locale "tr-TR", we still need to be able to save-and-restore the value that initdb inherited from the environment. Using my simulation, what I mean is if you do:

$ export LANG="liberté.UTF-8"

... then ...

$ initdb -D pgdata --locale "fr_FR.UTF-8"

... it still needs to be able to do a safe save-and-restore back to the environment's value without crashing, even though it's not really using that value. I wondered if initdb really needs a locale other than "C" for itself, but I'm trying to minimise the changes here so I
don't break something else... That seems to mean that we have to use
_wsetlocale() for that particular save/restore. This is essentially what Haifang's colleagues were suggesting upthread. Done.

Once a server is up and running, we shouldn't be exposed to the server environment (right?), but there are several more places that accept user-supplied strings, and only CREATE COLLATION fails to run
check_locale():

postgres=# create collation foo (locale = 'fraternité.UTF-8'); CREATE COLLATION

The other places I am aware of do, so putting the equivalent code into
check_locale() does the right thing:

postgres=# create database db locale = 'fraternité.UTF-8';
WARNING: locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR: invalid LC_COLLATE locale name: "fraternité.UTF-8"
HINT: If the locale name is specific to ICU, use ICU_LOCALE.

postgres=# set lc_time = 'fraternité.UTF-8';
WARNING: locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR: invalid value for parameter "lc_time": "fraternité.UTF-8"

postgres=# set lc_messages = 'fraternité.UTF-8';
WARNING: locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR: invalid value for parameter "lc_messages": "fraternité.UTF-8"

postgres=# set lc_monetary = 'fraternité.UTF-8';
WARNING: locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR: invalid value for parameter "lc_monetary": "fraternité.UTF-8"

Those last SET lc_* changes aren't strictly necessary, because we never restore to those, we only switch to them and then restore to the database default, but since they share code with CREATE DATABASE the new check kicks in anyway.

I tried adding check_locale() checks into CREATE COLLATION, but that made some tests fail (huh, I didn't know that the set of strings acceptable to _newlocale() and setlocale() are different -- the former seems to accept "POSIX" on that OS, which is exercised by some tests, or maybe it's some path of ours, I didn't check). But, on reflection, I don't believe that COLLATION objects actually have this problem anyway, because we never call setlocale() to them, we just use the modern thread-safe POSIX-2008-oid APIs. So I took that back out to minimise the patch.

Please see attached, including a restatement of the problem and solution in the commit message.

I suppose we could add some tests for this, but I was a bit unsure about the rules for putting non-ASCII into perl scripts and command lines, and would need to research that...

(Note: a lot of this old stuff is getting ripped out in master by the ongoing multi-threading effort...)

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2024-09-04 21:41:08 Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Previous Message Thomas Munro 2024-09-04 21:09:16 Re: BUG #18599: server closed the connection unexpectedly