From: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
---|---|
To: | "Jeff Davis" <pgsql(at)j-davis(dot)com> |
Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Sandro Santilli <strk(at)kbt(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Regina Obe <lr(at)pcorp(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Order changes in PG16 since ICU introduction |
Date: | 2023-05-19 19:13:47 |
Message-ID: | 360c90b9-7c20-4cec-aade-38e6e3351c05@manitou-mail.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jeff Davis wrote:
> 2) Automatically change the provider to libc when locale=C.
>
> Almost works, but it's not clear how we handle the case "provider=icu
> lc_collate='fr_FR.utf8' locale=C".
>
> If we change it to "provider=libc lc_collate=C", we've overridden the
> specified lc_collate. If we ignore the locale=C, that would be
> surprising to users. If we throw an error, that would be a backwards
> compatibility issue.
This thread started with a report illustrating that when users mention
the locale "C", they implicitly mean "C" from the libc provider, as
when libc was the default. The problem is that as soon as ICU is the
default, any reference to a libc collation should mention explicitly
that the provider is libc.
It seems what we're set on the idea to create an exception for "C"
(and I assume also "POSIX") to avoid too much confusion, and because
"C" is quite special anyway, and has no equivalent in ICU (the switch
in v16 to ICU as the default provider is based on the premise that the
locales with the same name will behave pretty much the same with ICU
as they did with libc, but it's absolutely not the case with "C").
ISTM that if we want to go that route, we need the make the minimum
changes at the user interface level and not any deeper, so that when
(locale="C" OR locale="POSIX") AND the provider has not been specified,
then the command (initdb and create database) act as if the user had
specified provider=libc.
> (3) Support iculocale=C in the ICU provider using the memcmp() path.
> In other words, if provider=icu and iculocale=C, lc_collate_is_c() and
> lc_ctpye_is_c() would both return true.
ICU does not provide a locale that behaves like that, and it doesn't
feel right to pretend it does. It feels like attacking the problem
at the wrong level.
> (4) Create a new "none" provider (which has no locale and always memcmp
> semantics), and automatically change the provider to "none" if
> provider=icu and iculocale=C.
It still uses libc/C for character classification and case changing,
so "no locale" is technically not true. Personally I don't see
the benefit of adding a "none" provider. C is a libc locale
and libc is not disappearing. I also think that when users explicitly
indicate provider=icu, they should get icu.
Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-05-19 19:29:16 | Re: Assert failure of the cross-check for nullingrels |
Previous Message | Aleksander Alekseev | 2023-05-19 18:42:50 | Re: The documentation for READ COMMITTED may be incomplete or wrong |