From: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
---|---|
To: | "Jeff Davis" <pgsql(at)j-davis(dot)com> |
Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Sandro Santilli <strk(at)kbt(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Regina Obe <lr(at)pcorp(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Order changes in PG16 since ICU introduction |
Date: | 2023-05-22 20:09:00 |
Message-ID: | cb448574-aa7c-4969-b2dd-c9eb221d7e06@manitou-mail.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jeff Davis wrote:
> If we special case locale=C, but do nothing for locale=fr_FR, then I'm
> not sure we've solved the problem. Andrew Gierth raised the issue here,
> which he called "maximally confusing":
>
> https://postgr.es/m/874jp9f5jo.fsf@news-spur.riddles.org.uk
>
> That's why I feel that we need to make locale apply to whatever the
> provider is, not just when it happens to be C.
While I agree that the LOCALE option in CREATE DATABASE is
counter-intuitive, I find it questionable that blending ICU
and libc locales into it helps that much with the user experience.
Trying the lastest v6-* patches applied on top of 722541ead1
(before the pgindent run), here are a few examples when I
don't think it goes well.
The OS is Ubuntu 22.04 (glibc 2.35, ICU 70.1)
initdb:
Using default ICU locale "fr".
Using language tag "fr" for ICU locale "fr".
The database cluster will be initialized with this locale configuration:
provider: icu
ICU locale: fr
LC_COLLATE: fr_FR.UTF-8
LC_CTYPE: fr_FR.UTF-8
LC_MESSAGES: fr_FR.UTF-8
LC_MONETARY: fr_FR.UTF-8
LC_NUMERIC: fr_FR.UTF-8
LC_TIME: fr_FR.UTF-8
The default database encoding has accordingly been set to "UTF8".
#1
postgres=# create database test1 locale='fr_FR.UTF-8';
NOTICE: using standard form "fr-FR" for ICU locale "fr_FR.UTF-8"
ERROR: new ICU locale (fr-FR) is incompatible with the ICU locale of the
template database (fr)
HINT: Use the same ICU locale as in the template database, or use template0
as template.
That looks like a fairly generic case that doesn't work seamlessly.
#2
postgres=# create database test2 locale='C.UTF-8' template='template0';
NOTICE: using standard form "en-US-u-va-posix" for ICU locale "C.UTF-8"
CREATE DATABASE
en-US-u-va-posix does not sort like C.UTF-8 in glibc 2.35, so
this interpretation is arguably not what a user would expect.
I would expect the ICU warning or error (icu_validation_level) to kick
in instead of that transliteration.
#3
$ grep french /etc/locale.alias
french fr_FR.ISO-8859-1
postgres=# create database test3 locale='french' template='template0'
encoding='LATIN1';
WARNING: ICU locale "french" has unknown language "french"
HINT: To disable ICU locale validation, set parameter icu_validation_level
to DISABLED.
CREATE DATABASE
In practice we're probably getting the "und" ICU locale whereas "fr" would
be appropriate.
I assume that we would find more cases like that if testing on many
operating systems.
Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2023-05-22 20:18:28 | Re: PG 16 draft release notes ready |
Previous Message | MARK CALLAGHAN | 2023-05-22 19:40:25 | Re: benchmark results comparing versions 15.2 and 16 |