From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Windows default locale vs initdb |
Date: | 2022-07-20 11:44:04 |
Message-ID: | CA+hUKGJZskvCh=Qm75UkHrY6c1QZUuC92Po9rponj1BbLmcMEA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 20, 2022 at 10:27 PM Juan José Santamaría Flecha
<juanjo(dot)santamaria(at)gmail(dot)com> wrote:
> On Tue, Jul 19, 2022 at 4:47 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>> As for whether "accordingly" still applies, by the logic of of
>> win32_langinfo()... Windows still considers WIN1252 to be the default
>> ANSI code page for "en-US", though it'd work with UTF-8 too. I'm not
>> sure what to make of that. The goal here was to give Windows users
>> good defaults, but WIN1252 is probably not what most people actually
>> want. Hmph.
>
>
> Still, WIN1252 is not the wrong answer for what we are asking. Even if you enable UTF-8 support [1], the system will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.
I'm still confused about what that means. Suppose we decided to
insist by adding a ".UTF-8" suffix to the name, as that page says we
can now that we're on Windows 10+, when building the default locale
name (see experimental 0002 patch, attached). It initially seemed to
have the right effect:
The database cluster will be initialized with locale "en-US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
But then the Turkish i test in contrib/citext/sql/citext_utf8.sql failed[1]:
SELECT 'i'::citext = 'İ'::citext AS t;
t
---
- t
+ f
(1 row)
About the pg_upgrade problem, maybe it's OK ... existing old format
names should continue to work, but we can still remove the weird code
that does locale name tweaking, right? pg_upgraded databases should
contain fixed names (ie that were fixed by old initdb so should
continue to work), and new clusters will get BCP 47 names.
I don't really know, I was just playing with rough ideas by sending
patches to CI here...
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Default-to-BCP-47-locale-in-initdb-on-Windows.patch | text/x-patch | 3.8 KB |
v3-0002-Default-to-UTF-8-in-initdb-on-Windows.patch | text/x-patch | 2.0 KB |
v3-0003-Remove-support-for-old-Windows-locale-names.patch | text/x-patch | 19.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2022-07-20 11:55:33 | Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages |
Previous Message | Bharath Rupireddy | 2022-07-20 11:39:09 | Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop? |