From: | Nico Williams <nico(at)cryptonector(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Avi Uziel <avi(dot)uziel(at)aidoc(dot)com>, Manika Singhal <manika(dot)singhal(at)enterprisedb(dot)com>, Ben Caspi <benc(at)aidoc(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Liran Amrani <lirana(at)aidoc(dot)com>, Shahar Amram <shahara(at)aidoc(dot)com>, Sandeep Thakkar <sandeep(dot)thakkar(at)enterprisedb(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us |
Subject: | Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows |
Date: | 2025-04-15 16:10:31 |
Message-ID: | Z/6E91nSacY/8zj1@ubby |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Apr 10, 2025 at 11:49:18AM +1200, Thomas Munro wrote:
> On Tue, Apr 8, 2025 at 5:50 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
> > I think that we should stick with BCP-47 locale names as much as
> > possible. The problem with the long locale names is not only
> > non-ASCII characters, but that Microsoft keeps changing these names,
> > and PostgreSQL persists them in the catalog, which causes trouble if
> > Windows is upgraded.
>
> +100
With respect, BCP-47 defines language tags, not locale names. A locale
name definitely needs a way to identify a language, so BCP-47 can be
part of a locale identifier, but a locale needs more than that: it also
needs a way to identify the codeset used in that locale.
The original post in this thread had postgres using `English_United
States.1252` as the locale name, no part of which is BCP-47-like, but
also BCP-47 has no way of encoding "codepage 1252" as the codeset
because BCP-47 is specifically and only about languages, not codesets.
I'm not actually sure what is the best standard to use for identifying
_locales_ as opposed to _languages_, but BCP-47 isn't it. POSIX has a
notion of locales, but not registry of locale names and definitions.
POSIX locale naming using BCP-47 language tags and some codeset
identifier seems like the best way, but unlike BCP-47 there is no IANA
registry of locale names, but there is an IANA registry of codesets:
https://www.iana.org/assignments/character-sets/character-sets.xhtml
(which isn't quite a registry of codeset names but it will do) so you
can construct a standard-ish locale name out of language tags and
charset/codeset names.
I believe -correct me if I'm wrong- that the IETF is not interested in
publishing an RFCs/BCPs/STDs regarding locales, nor having an IANA
locale name registry, because the IETF wants the world to use Unicode
and standard Unicode transforms like UTF-8, in which case BCP-47 should
be enough (because the transform should be understood from context).
But the real world still has to deal with non-Unicode codesets and,
therefore with locales. The CLDR uses some sort of locale name, but
still without a codeset name (because CLDR is about everything about
locales except the codeset name because Unicode), and glibc can be used
as a sort of source of standard-ish POSIX locale names that do include
codeset names.
So if you want to identify _locales_ you might have to either construct
your own locale names out of BCP-47 or CLDR and add a codeset name
subtag, or use glibc's POSIX locale names augmented with Windows
codepage names.
Nico
--
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2025-04-15 21:11:02 | Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows |
Previous Message | Manika Singhal | 2025-04-15 14:51:41 | Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows |