From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
Cc: | Avi Uziel <avi(dot)uziel(at)aidoc(dot)com>, Manika Singhal <manika(dot)singhal(at)enterprisedb(dot)com>, Ben Caspi <benc(at)aidoc(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Liran Amrani <lirana(at)aidoc(dot)com>, Shahar Amram <shahara(at)aidoc(dot)com>, Sandeep Thakkar <sandeep(dot)thakkar(at)enterprisedb(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us |
Subject: | Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows |
Date: | 2025-04-09 23:49:18 |
Message-ID: | CA+hUKGKcbtDSvTySZgmpm2GyzBwTKnEQsaYpgbT6zw0-1qLz6Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Tue, Apr 8, 2025 at 5:50 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
> On Mon, 2025-04-07 at 17:59 +0300, Avi Uziel wrote:
> > I anticipate that users could face an upgrade issue if they installed a cluster on a Windows
> > machine using the default locale before this change and then attempt to upgrade to a release
> > that includes this commit.
>
> Yes, I think this will cause more trouble.
>
> I think that we should stick with BCP-47 locale names as much as possible. The problem with
> the long locale names is not only non-ASCII characters, but that Microsoft keeps changing these
> names, and PostgreSQL persists them in the catalog, which causes trouble if Windows is upgraded.
+100
I've been saying for a while[1] that initdb should default to BCP-47
too. But it seems likely that the entire PostgreSQL-on-Windows
community uses the popular EDB Installer, not command line tools? It
doesn't use initdb's default anyway, so changes to initdb were never
going to start the migration of the world's PostgreSQL-on-Windows
fleet to BCP-47. It took President Erdoğan and the EDB installer team
to kick it off. Despite this upgrade wrinkle, which I'm sure we can
smooth out, I'm really happy to see it finally happening!
Historically in our commit log, tree and mailing lists we've kvetched
about unstable locale names on that OS, but the
please-don't-store-the-old-style-names-anywhere-do-this-other-thing-instead
warning in the documentation presumably started with Vista a couple of
decades ago. Unfortunately that shipped around the same time as the
PostgreSQL-on-Windows port, and of course it targeted older releases
for some time. Better late than never.
(As an aside: I think our kvetching wasn't unwarranted, it just didn't
tackle the root problem. Perhaps unsurprisingly, because they changed
that in all the Windows native APIs, but setlocale() continued and
continues to expose the pre-Vista names, creating an asymmetry and
incidentally also exposing non-C standard conforming behaviour due to
char/whcar_t schizophrenia. I could guess that setlocale() and
friends only exist for minimal/partial conformance anyway, and some
kind of no-changes-there-unless-we-have-to policy was took priority?
After all, few are interested in non-MT-safe code on that platform.
That's all OK once you understand it and code around it, and it's a
solid clue that French_France probably accesses the same underlying
locale definition as fr-FR, but I've been hesitant to write any code
myself that relies on that without a written contract. FWIW I think
our locale name situation is finally pretty close to where it should
be now on Windows, partially as a byproduct of our own renewed
interest in MT-safe code. I also hope we can do native UTF-8 for v19
(perhaps creating a future upgrade hump for fr-FR -> fr-FR.UTF-8,
which it would have been nice to fold into this one and maybe we
should consider that some time soon). And also tackle some related
encoding problems in other places, namely argv, environ, path names,
shared database objects and early-stage pq protocol. Some of that
requires slightly different thinking on Windows due to its wchar_t
model. I've written long emails about my learnings there eg some
library calls expect setlocale() encoded C strings, others the ACP,
some we flipped from one camp to the other by making our own
wrappers/emulations of libc calls, and a few corners can only be
sanely approached by resorting to wchar_t APIs. Basically there was
zero hope of a non-Windows hacker pulling that sort of thing off
before we had good CI, and it helps a lot to view it as part of larger
project to nail down the semantics of multi-encoding clusters on all
OSes, with only a few special concerns for Windows... anyway, many POC
patches exist, hacking continues...)
> I can see three potential ways to deal with that:
>
> 1. Only when creating a cluster for upgrade, use the locale names that the old cluster uses.
>
> That is difficult, because it requires to connect to that cluster, as the information is
> only in catalog tables. Also, if there are several clusters, which one to use?
>
> 2. Before upgrading a cluster, update the catalog tables of the old cluster with the
> corresponding BCP-47 name.
>
> That would be a good way into a world with only BCP-47 locale names.
>
> On the one hand, it would be convenient to have pg_upgrade do this automatically, but if
> the upgrade fails for whatever reason, your original cluster got modified, which doesn't
> feel right.
>
> Perhaps the Windows binaries could come with an extra tool for such a change, and perhaps
> the installer could suggest running it before an upgrade.
I was hesitant to suggest this approach in the many threads about this
mess even though I can see the appeal of a silent upgrade path. If
the EDB tooling or any potential core PostgreSQL tooling had a
built-in assumption that the pre-Vista locales are 100% compatible
with the Vista locales without any official documentation to point to,
that'd be based on a guess or unsubstantiated claim AFAIK and could
turn out wrong in some subtle way. Even though you can sort of
deduce/guess that the old-style names are really just the display
names (I forget the term) that appear in the structs that GetLocales()
(or whatever it's called) spits out, I would prefer to leave
catalogue-clobbering shenanigans up to brave end users, even though
it'd probably work out OK in practice. Call me a chicken, but if
existing clusters aren't broken, why fix them? Any human or script
that does such things should probably also run REINDEX without proof
positive that it's not needed (amcheck, or documented evidence that
it's not necessary), or take that risk themselves, and technically it
is not only indexes that can depend on locale definitions (also CHECK,
and more things I've forgotten since the failed attempt to model those
dependencies a few releases back...). It's a nice goal, but my gut
feeling was that it should not be tangled up with upgrades.
The only users who *can't* use the old style names without an extra
step are those actually affected by country renames like Türkiye. The
path of least resistance for them seems to be to follow the advice on
creating a "Turkish_Turkey" locale using the Locale Builder tool, and
then just keep using that name across the upgrade. Affected users
presumably already had to do that to keep their old cluster running
before they decided to upgrade anyway, so why force them to tackle a
transition to BCP-47 *during* an upgrade?
> 3. A minimal solution would be to equip the Windows installer with some guidance for the user
> at the point where the locale is chosen. The BCP-47 locale name would be suggested, and
> a text could point out "If you plan to upgrade a cluster that was created with long locale
> names, please select the same locale names for the new cluster".
+1
That's why I suggested[2] that the installer should at least still
*offer* the old names in the thread where we banned non-ASCII names
(which was not fun to back-patch, but initdb.exe was apparently
crashing all over Türkiye, to the extent that the Windows crash report
investigation team (I think that's who it was?) was compelled to reach
out to our list (at least that's what I guess happened, IDK...). Does
it not? It should definitely encourage the use of the modern names by
default for new installations, but perhaps it should have a checkbox
to reveal the old ones? It could grey out or filter out the non-ASCII
ones that initdb would reject. +1 for adding some clues about the
relevance to upgrades in the GUI too.
[1] https://commitfest.postgresql.org/patch/3772/
[2] https://www.postgresql.org/message-id/CA%2BhUKGLfrK33XpFXsRcc97a1Qa5Vz1YFEn4GC1vie7yse%3DffPA%40mail.gmail.com
From | Date | Subject | |
---|---|---|---|
Next Message | Manika Singhal | 2025-04-10 07:02:58 | Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows |
Previous Message | PG Bug reporting form | 2025-04-09 23:04:20 | BUG #18886: identity duplicate key |