RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From: "Haifang Wang (Centific Technologies Inc)" <v-haiwang(at)microsoft(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Date: 2024-05-13 22:27:32
Message-ID: PH8PR21MB39027F1F4EAFEAE33EB4BA39E5E22@PH8PR21MB3902.namprd21.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Thomas,

Thanks for your reply. But I'm not sure if there is any miscommunication. Let me make it clear again.

I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test organization to notify developers when issues have been identified in their applications. The issue I reported in this mail is an issue we found in our testing and I believe it also impact a lot of end users. Like you mentioned previously, it is caused by recent operating system upgrade.

The solution I shared in my last email was suggested our engineers. It would be great is you could use _wsetlocale API. this is what the recommended way as par Microsoft documentation as well. (Reference: setlocale wsetlocale: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170

If you don’t want to change the API to update their mapping table with Turkish_Tur in case of Turkish_Türkiye as you have done for others locale with same issue as well in your code (reference PostgreSQL Source Code: src/port/win32setlocale.c Source File: https://doxygen.postgresql.org/win32setlocale_8c_source.html, Line number 66-67).

Please let me know if there is any misunderstanding.

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Sent: Monday, May 13, 2024 3:12 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang(at)microsoft(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 14, 2024 at 6:51 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang(at)microsoft(dot)com> wrote:
> You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by the changes made on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future?

First, let me restate the problem:

When you create a database cluster (= a PostgreSQL instance) with "initdb", unless you request a default locale with --locale, initdb uses setlocale("") to query the system/user default locale. It then records that string in postgresql.conf, and also in the pg_database catalog. On POSIX systems, that captures something like "tr-TR.UTF-8"
or similar. On Windows, that captures something like "Turkish_Turkey.1254". Later, PostgreSQL uses newlocale() or
setlocale() functions to access that locale again. In rare cases where a country changes its name, a Windows update *renames* the locale, and then those calls fail, because the old name is not recognised anymore.

I proposed a partial solution that should help avoid the problem in future:

I think that initdb should instead call GetUserDefaultLocaleName() to discover the user account's default locale, because it returns BCP47 names like "tr-TR". It is probably less likely for a country or language to change its ISO code (but not impossible[1]), than for the English-language names of them to change. Even if you reject this idea because technically they can both change, there are other reasons why we should not be storing this "display"-style names anywhere, including that PostgreSQL needs to store them in a place where the encoding must be ASCII, which Türkiye is not. And finally, the Windows manual explicitly warns us about this[2]: "We don't recommend this form for locale strings embedded in code or serialized to
storage: These strings are more likely to be changed by an operating system update than the locale name form."

(Note: the BCP47 support in Windows did not exist when that PostgreSQL code was written, so it did what it had to at the time.)

There are further problems to resolve:

1. We don't know if we should put encodings (AKA codepages?) on the end of those strings or not. There is some confusion about what exactly it does, and how it interacts with the "ACP". I'd be worried that if you don't put the endings on, perhaps it can change under your feet. (I suspect that part of the discussion on that other thread took some wrong turns, based on citext_utf8 results that were actually probably misleading.) 2. If we do decide to put the encoding suffixes on, which encoding should we be suggesting? It appears from anecdotal reports that most PostgreSQL-on-Windows users are stuck in the past, using the old
pre-UTF-8 language-specific encodings. Should we be encouraging UTF-8 use by default, if we can? Maybe that is a separate question.

Then there is the practical question of what to do with an already-broken system. One idea would be to introduce a "locale remapping" file, pgdata/pg_locale.map where you can write things like "Turkish_Turkey.1254"="Turkish_Türkiye.1254".

I think there might be a GUI tool that allows you to duplicate, rename, etc locales in Windows, so you can re-create the old name. I believe that is how some people have fixed their broken databases. I don't know if there is a good reference/blog/article on that, that we should be pointing people towards if they show up with broken systems.

Patches, testing, research are welcome! Even though I put forward that
BCP47 idea, it was based on reading the manual, so the "unresolved"
questions may in fact be very easy to resolve by people who actually use/know Windows. Even if I had been gung-ho about committing that in 16 without feedback from Windows users, it would have been too late to help Turkish users with existing databases.

[1] https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names
[2] https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2024-05-13 22:47:23 Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Previous Message Thomas Munro 2024-05-13 22:11:53 Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607