Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows

From: Manika Singhal <manika(dot)singhal(at)enterprisedb(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Avi Uziel <avi(dot)uziel(at)aidoc(dot)com>, Ben Caspi <benc(at)aidoc(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Liran Amrani <lirana(at)aidoc(dot)com>, Shahar Amram <shahara(at)aidoc(dot)com>, Sandeep Thakkar <sandeep(dot)thakkar(at)enterprisedb(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows
Date: 2025-04-10 07:02:58
Message-ID: CAB1XC2+U1sNvM5WiEVr0SLfzRZF14PjDWSNeb=dTsHSKdoZu8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Apr 10, 2025 at 5:19 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> On Tue, Apr 8, 2025 at 5:50 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
> wrote:
> > On Mon, 2025-04-07 at 17:59 +0300, Avi Uziel wrote:
> > > I anticipate that users could face an upgrade issue if they installed
> a cluster on a Windows
> > > machine using the default locale before this change and then attempt
> to upgrade to a release
> > > that includes this commit.
> >
> > Yes, I think this will cause more trouble.
> >
> > I think that we should stick with BCP-47 locale names as much as
> possible. The problem with
> > the long locale names is not only non-ASCII characters, but that
> Microsoft keeps changing these
> > names, and PostgreSQL persists them in the catalog, which causes trouble
> if Windows is upgraded.
>
> +100
>
> I've been saying for a while[1] that initdb should default to BCP-47
> too. But it seems likely that the entire PostgreSQL-on-Windows
> community uses the popular EDB Installer, not command line tools? It
> doesn't use initdb's default anyway, so changes to initdb were never
> going to start the migration of the world's PostgreSQL-on-Windows
> fleet to BCP-47. It took President Erdoğan and the EDB installer team
> to kick it off. Despite this upgrade wrinkle, which I'm sure we can
> smooth out, I'm really happy to see it finally happening!
>
> Historically in our commit log, tree and mailing lists we've kvetched
> about unstable locale names on that OS, but the
> please-don't-store-the-old-style-names-anywhere-do-this-other-thing-instead
> warning in the documentation presumably started with Vista a couple of
> decades ago. Unfortunately that shipped around the same time as the
> PostgreSQL-on-Windows port, and of course it targeted older releases
> for some time. Better late than never.
>
> (As an aside: I think our kvetching wasn't unwarranted, it just didn't
> tackle the root problem. Perhaps unsurprisingly, because they changed
> that in all the Windows native APIs, but setlocale() continued and
> continues to expose the pre-Vista names, creating an asymmetry and
> incidentally also exposing non-C standard conforming behaviour due to
> char/whcar_t schizophrenia. I could guess that setlocale() and
> friends only exist for minimal/partial conformance anyway, and some
> kind of no-changes-there-unless-we-have-to policy was took priority?
> After all, few are interested in non-MT-safe code on that platform.
> That's all OK once you understand it and code around it, and it's a
> solid clue that French_France probably accesses the same underlying
> locale definition as fr-FR, but I've been hesitant to write any code
> myself that relies on that without a written contract. FWIW I think
> our locale name situation is finally pretty close to where it should
> be now on Windows, partially as a byproduct of our own renewed
> interest in MT-safe code. I also hope we can do native UTF-8 for v19
> (perhaps creating a future upgrade hump for fr-FR -> fr-FR.UTF-8,
> which it would have been nice to fold into this one and maybe we
> should consider that some time soon). And also tackle some related
> encoding problems in other places, namely argv, environ, path names,
> shared database objects and early-stage pq protocol. Some of that
> requires slightly different thinking on Windows due to its wchar_t
> model. I've written long emails about my learnings there eg some
> library calls expect setlocale() encoded C strings, others the ACP,
> some we flipped from one camp to the other by making our own
> wrappers/emulations of libc calls, and a few corners can only be
> sanely approached by resorting to wchar_t APIs. Basically there was
> zero hope of a non-Windows hacker pulling that sort of thing off
> before we had good CI, and it helps a lot to view it as part of larger
> project to nail down the semantics of multi-encoding clusters on all
> OSes, with only a few special concerns for Windows... anyway, many POC
> patches exist, hacking continues...)
>
> > I can see three potential ways to deal with that:
> >
> > 1. Only when creating a cluster for upgrade, use the locale names that
> the old cluster uses.
> >
> > That is difficult, because it requires to connect to that cluster, as
> the information is
> > only in catalog tables. Also, if there are several clusters, which
> one to use?
> >
> > 2. Before upgrading a cluster, update the catalog tables of the old
> cluster with the
> > corresponding BCP-47 name.
> >
> > That would be a good way into a world with only BCP-47 locale names.
> >
> > On the one hand, it would be convenient to have pg_upgrade do this
> automatically, but if
> > the upgrade fails for whatever reason, your original cluster got
> modified, which doesn't
> > feel right.
> >
> > Perhaps the Windows binaries could come with an extra tool for such a
> change, and perhaps
> > the installer could suggest running it before an upgrade.
>
> I was hesitant to suggest this approach in the many threads about this
> mess even though I can see the appeal of a silent upgrade path. If
> the EDB tooling or any potential core PostgreSQL tooling had a
> built-in assumption that the pre-Vista locales are 100% compatible
> with the Vista locales without any official documentation to point to,
> that'd be based on a guess or unsubstantiated claim AFAIK and could
> turn out wrong in some subtle way. Even though you can sort of
> deduce/guess that the old-style names are really just the display
> names (I forget the term) that appear in the structs that GetLocales()
> (or whatever it's called) spits out, I would prefer to leave
> catalogue-clobbering shenanigans up to brave end users, even though
> it'd probably work out OK in practice. Call me a chicken, but if
> existing clusters aren't broken, why fix them? Any human or script
> that does such things should probably also run REINDEX without proof
> positive that it's not needed (amcheck, or documented evidence that
> it's not necessary), or take that risk themselves, and technically it
> is not only indexes that can depend on locale definitions (also CHECK,
> and more things I've forgotten since the failed attempt to model those
> dependencies a few releases back...). It's a nice goal, but my gut
> feeling was that it should not be tangled up with upgrades.
>
> The only users who *can't* use the old style names without an extra
> step are those actually affected by country renames like Türkiye. The
> path of least resistance for them seems to be to follow the advice on
> creating a "Turkish_Turkey" locale using the Locale Builder tool, and
> then just keep using that name across the upgrade. Affected users
> presumably already had to do that to keep their old cluster running
> before they decided to upgrade anyway, so why force them to tackle a
> transition to BCP-47 *during* an upgrade?
>
> > 3. A minimal solution would be to equip the Windows installer with some
> guidance for the user
> > at the point where the locale is chosen. The BCP-47 locale name
> would be suggested, and
> > a text could point out "If you plan to upgrade a cluster that was
> created with long locale
> > names, please select the same locale names for the new cluster".
>
> +1
>
> That's why I suggested[2] that the installer should at least still
> *offer* the old names in the thread where we banned non-ASCII names
> (which was not fun to back-patch, but initdb.exe was apparently
> crashing all over Türkiye, to the extent that the Windows crash report
> investigation team (I think that's who it was?) was compelled to reach
> out to our list (at least that's what I guess happened, IDK...). Does
> it not? It should definitely encourage the use of the modern names by
> default for new installations, but perhaps it should have a checkbox
> to reveal the old ones? It could grey out or filter out the non-ASCII
> ones that initdb would reject. +1 for adding some clues about the
> relevance to upgrades in the GUI too.
>
> [1] https://commitfest.postgresql.org/patch/3772/
> [2]
> https://www.postgresql.org/message-id/CA%2BhUKGLfrK33XpFXsRcc97a1Qa5Vz1YFEn4GC1vie7yse%3DffPA%40mail.gmail.com

Thank you, Avi, Laurenz, and Thomas, for your inputs on the issue!

@Thomas — what you are suggesting makes complete sense.
We will look into what can be done on this going forward.

For now, we can update the locales dropdown in the installer to list all
the BCP names first,
followed by the long (old-style) names — excluding those with non-ASCII
characters.

We hope this approach will encourage new users to use only BCP names during
installation,
while still providing compatibility with the old-style names for users who
want to use pg_upgrade.
Let us know if this sounds reasonable.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2025-04-10 07:14:21 Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows
Previous Message Thomas Munro 2025-04-09 23:49:18 Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows