Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows

From: Nico Williams <nico(at)cryptonector(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Avi Uziel <avi(dot)uziel(at)aidoc(dot)com>, Manika Singhal <manika(dot)singhal(at)enterprisedb(dot)com>, Ben Caspi <benc(at)aidoc(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Liran Amrani <lirana(at)aidoc(dot)com>, Shahar Amram <shahara(at)aidoc(dot)com>, Sandeep Thakkar <sandeep(dot)thakkar(at)enterprisedb(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows
Date: 2025-04-15 16:10:31
Message-ID: Z/6E91nSacY/8zj1@ubby
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Apr 10, 2025 at 11:49:18AM +1200, Thomas Munro wrote:
> On Tue, Apr 8, 2025 at 5:50 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
> > I think that we should stick with BCP-47 locale names as much as
> > possible. The problem with the long locale names is not only
> > non-ASCII characters, but that Microsoft keeps changing these names,
> > and PostgreSQL persists them in the catalog, which causes trouble if
> > Windows is upgraded.
>
> +100

With respect, BCP-47 defines language tags, not locale names. A locale
name definitely needs a way to identify a language, so BCP-47 can be
part of a locale identifier, but a locale needs more than that: it also
needs a way to identify the codeset used in that locale.

The original post in this thread had postgres using `English_United
States.1252` as the locale name, no part of which is BCP-47-like, but
also BCP-47 has no way of encoding "codepage 1252" as the codeset
because BCP-47 is specifically and only about languages, not codesets.

I'm not actually sure what is the best standard to use for identifying
_locales_ as opposed to _languages_, but BCP-47 isn't it. POSIX has a
notion of locales, but not registry of locale names and definitions.
POSIX locale naming using BCP-47 language tags and some codeset
identifier seems like the best way, but unlike BCP-47 there is no IANA
registry of locale names, but there is an IANA registry of codesets:

https://www.iana.org/assignments/character-sets/character-sets.xhtml

(which isn't quite a registry of codeset names but it will do) so you
can construct a standard-ish locale name out of language tags and
charset/codeset names.

I believe -correct me if I'm wrong- that the IETF is not interested in
publishing an RFCs/BCPs/STDs regarding locales, nor having an IANA
locale name registry, because the IETF wants the world to use Unicode
and standard Unicode transforms like UTF-8, in which case BCP-47 should
be enough (because the transform should be understood from context).
But the real world still has to deal with non-Unicode codesets and,
therefore with locales. The CLDR uses some sort of locale name, but
still without a codeset name (because CLDR is about everything about
locales except the codeset name because Unicode), and glibc can be used
as a sort of source of standard-ish POSIX locale names that do include
codeset names.

So if you want to identify _locales_ you might have to either construct
your own locale names out of BCP-47 or CLDR and add a codeset name
subtag, or use glibc's POSIX locale names augmented with Windows
codepage names.

Nico
--

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2025-04-15 21:11:02 Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows
Previous Message Manika Singhal 2025-04-15 14:51:41 Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows