Re: Order changes in PG16 since ICU introduction

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Order changes in PG16 since ICU introduction
Date: 2023-06-06 19:18:20
Message-ID: f8d09d8f3d53daa9cdb446d021fe33d6ff7f1ee3.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2023-06-06 at 15:09 +0200, Daniel Verite wrote:
> FWIW I don't quite see how 0001 improve things or what problem it's
> trying to solve.

The word "locale" is generic, so we need to make LOCALE/--locale apply
to whatever provider is being used. If "locale" only applies to libc,
using ICU will always be confusing and never be on the same level as
libc, let alone the preferred provider.

The locale "C" is a special case, documented as a non-locale. So, if
LOCALE/--locale apply to ICU, then either ICU needs to handle locale
"C" in the expected way (v8 patch series); or when we see locale "C" we
need to somehow change the provider into something that can handle it
(v6 patch series changes it to the "none" provider).

Please let me know if you disagree with the goal or the reasoning here.
If so, please explain where you think we should end up, because the
status quo does not seem great to me.

> 0001 creates exceptions throughout the code so that when an ICU
> collation has a locale name "C" or "POSIX" then it does not behave
> like an ICU collation, even though pg_collation.collprovider='i'
> To me it's neither desirable nor necessary that a collation that
> has collprovider='i' is diverted to non-ICU semantics.

It's not very principled, but it matches what libc does.

> Also in the current state, this diversion does not apply to initdb.
>
> "initdb --icu-locale=C" with 0001 applied reports this:
>
>    Using language tag "en-US-u-va-posix" for ICU locale "C".

Thank you. I fixed it by skipping the canonicalization for C/POSIX
locales in initdb.

> Could you elaborate a bit more on what 0001 is meant to achieve, from
> the point of view of the user?

It makes it so the user consistently (regardless of the provider) gets
the "no locale" behavior (as documented and historically expected) when
they specify the C or POSIX locales.

Then that enables us to change LOCALE/--locale to apply to ICU, which
means that a simple command like "initdb --locale=en_US" does a
sensible thing regardless of the default provider.

I understand you are skeptical of trying to apply an arbitrary locale
name to ICU, but if they don't specify the provider, what do you expect
to happen?

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachment Content-Type Size
v9-0001-ICU-support-locale-C-with-the-same-behavior-as-li.patch text/x-patch 12.5 KB
v9-0002-pg_upgrade-check-for-ICU-locale-C-in-versions-15-.patch text/x-patch 4.7 KB
v9-0003-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patch text/x-patch 15.2 KB
v9-0004-Use-database-default-collation-s-provider-as-defa.patch text/x-patch 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2023-06-06 19:21:25 Re: Order changes in PG16 since ICU introduction
Previous Message Joe Conway 2023-06-06 19:18:02 Re: Order changes in PG16 since ICU introduction