RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From: "Haifang Wang (Centific Technologies Inc)" <v-haiwang(at)microsoft(dot)com>
To: Rahul Pandey <pandeyrah(at)microsoft(dot)com>, Vishwa Deepak <Vishwa(dot)Deepak(at)microsoft(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Shawn Steele <Shawn(dot)Steele(at)microsoft(dot)com>, Amy Wishnousky <amyw(at)microsoft(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Shweta Gulati <gulatishweta(at)microsoft(dot)com>, Ashish Nawal <nawalashish(at)microsoft(dot)com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Date: 2024-05-28 18:21:18
Message-ID: PH8PR21MB3902C9024782A8A3E26370ACE5F12@PH8PR21MB3902.namprd21.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks Vishwa for all the clarification below.

Hi @Rahul<mailto:pandeyrah(at)microsoft(dot)com> and everyone,

Is there anything else not clear? Is there any solution for the issue?

Thanks!
Haifang

From: Rahul Pandey <pandeyrah(at)microsoft(dot)com>
Sent: Wednesday, May 22, 2024 4:37 AM
To: Vishwa Deepak <Vishwa(dot)Deepak(at)microsoft(dot)com>; Haifang Wang (Centific Technologies Inc) <v-haiwang(at)microsoft(dot)com>; Thomas Munro <thomas(dot)munro(at)gmail(dot)com>; Shawn Steele <Shawn(dot)Steele(at)microsoft(dot)com>; Amy Wishnousky <amyw(at)microsoft(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>; pgsql-bugs(at)lists(dot)postgresql(dot)org; Shweta Gulati <gulatishweta(at)microsoft(dot)com>; Ashish Nawal <nawalashish(at)microsoft(dot)com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

Thanks, Vishwa, for tagging me.
Adding my two cents,

Hello Thomas,

1. What is the oldest Windows release that can understand the "new" BCP47 locale names, like "tr-TR" or "tr-TR.1452"? (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this. However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye. [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])
Vishwa: Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.

Rahul: Locale names based on BCP 47 were first introduced in Windows Vista timeframe, so using them should be pretty safe for most modern and older versions (unless its XP or earlier).

2. If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end? What does it mean exactly?
What does it mean if you don't put it there? (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP". What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect? For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens? I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

Vishwa: It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details
https://learn.microsoft.com/en-us/windows/uwp/app-resources/how-rms-matches-lang-tags

Rahul: For the first part, your assumption is correct. The behaviour for "tr-TR" and "tr-TR.ACP" would be same and it would try to use the default ANSI Code Page for Turkish (which happens to be 1254<https://en.wikipedia.org/wiki/Windows-1254>). Using any other code page (for example"tr-TR.1252") would use that codepage (1252: English). For the second part (example), I am not sure if I understand the question completely, but mixing the encoding is almost never a good idea and could lead to mojibaked strings in the worst case to no change (if strings only contain ASCII chars) in the best-case scenario.

3. Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names? (In other words, is it *exactly the same code and driving data*, just using different labels? Or is it a new locale implementation that could differ arbitrarily in behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)

Vishwa: Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü

Rahul: I agree with Vishwa. the locale is the same, just the name of the country in English is changed. Rest all data is the same.

Thanks,
Rahul

________________________________
From: Vishwa Deepak <Vishwa(dot)Deepak(at)microsoft(dot)com<mailto:Vishwa(dot)Deepak(at)microsoft(dot)com>>
Sent: Tuesday, May 21, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang(at)microsoft(dot)com<mailto:v-haiwang(at)microsoft(dot)com>>; Thomas Munro <thomas(dot)munro(at)gmail(dot)com<mailto:thomas(dot)munro(at)gmail(dot)com>>; Rahul Pandey <pandeyrah(at)microsoft(dot)com<mailto:pandeyrah(at)microsoft(dot)com>>; Shawn Steele <Shawn(dot)Steele(at)microsoft(dot)com<mailto:Shawn(dot)Steele(at)microsoft(dot)com>>; Amy Wishnousky <amyw(at)microsoft(dot)com<mailto:amyw(at)microsoft(dot)com>>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us<mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>>; pgsql-bugs(at)lists(dot)postgresql(dot)org<mailto:pgsql-bugs(at)lists(dot)postgresql(dot)org> <pgsql-bugs(at)lists(dot)postgresql(dot)org<mailto:pgsql-bugs(at)lists(dot)postgresql(dot)org>>; Shweta Gulati <gulatishweta(at)microsoft(dot)com<mailto:gulatishweta(at)microsoft(dot)com>>; Ashish Nawal <nawalashish(at)microsoft(dot)com<mailto:nawalashish(at)microsoft(dot)com>>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

Hey @Haifang Wang (Centific Technologies Inc)<mailto:v-haiwang(at)microsoft(dot)com>
I am adding experts in this thread to address these queries. @Amy Wishnousky<mailto:amyw(at)microsoft(dot)com> @Shawn Steele<mailto:Shawn(dot)Steele(at)microsoft(dot)com> @Rahul Pandey<mailto:pandeyrah(at)microsoft(dot)com>
Adding my inline comment to best of my knowledge.

Thanks & Regards
Vishwa

________________________________
From: Haifang Wang (Centific Technologies Inc) <v-haiwang(at)microsoft(dot)com<mailto:v-haiwang(at)microsoft(dot)com>>
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com<mailto:thomas(dot)munro(at)gmail(dot)com>>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us<mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>>; Vishwa Deepak <Vishwa(dot)Deepak(at)microsoft(dot)com<mailto:Vishwa(dot)Deepak(at)microsoft(dot)com>>; pgsql-bugs(at)lists(dot)postgresql(dot)org<mailto:pgsql-bugs(at)lists(dot)postgresql(dot)org> <pgsql-bugs(at)lists(dot)postgresql(dot)org<mailto:pgsql-bugs(at)lists(dot)postgresql(dot)org>>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

Hi Vishwa,

Could you please help with the questions below?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com<mailto:thomas(dot)munro(at)gmail(dot)com>>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang(at)microsoft(dot)com<mailto:v-haiwang(at)microsoft(dot)com>>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us<mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>>; Vishwa Deepak <Vishwa(dot)Deepak(at)microsoft(dot)com<mailto:Vishwa(dot)Deepak(at)microsoft(dot)com>>; pgsql-bugs(at)lists(dot)postgresql(dot)org<mailto:pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang(at)microsoft(dot)com<mailto:v-haiwang(at)microsoft(dot)com>> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0<https://www.postgresql.org/message-id/flat/PH8PR21MB3902F334A3174C54058F792CE5182%40PH8PR21MB3902.namprd21.prod.outlook.com>

But let me ask the questions again, with some motivation/reason I want to know in parentheses:

1. What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"? (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this. However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye. [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.

2. If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end? What does it mean exactly?
What does it mean if you don't put it there? (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP". What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect? For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens? I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details
https://learn.microsoft.com/en-us/windows/uwp/app-resources/how-rms-matches-lang-tags
[https://learn.microsoft.com/en-us/media/open-graph-image.png]<https://learn.microsoft.com/en-us/windows/uwp/app-resources/how-rms-matches-lang-tags>
How the Resource Management System matches language tags - UWP applications<https://learn.microsoft.com/en-us/windows/uwp/app-resources/how-rms-matches-lang-tags>
The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail.
learn.microsoft.com

3. Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names? (In other words, is it *exactly the same code and driving data*, just using different labels? Or is it a new locale implementation that could differ arbitrarily in behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü

Please do proper due diligence at your end before proceeding with any kind of mapping.
Regards     
Vishwa

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Laurenz Albe 2024-05-29 07:51:08 Re: Bug report - pg_upgrade tool seems to have a race condition when trying to delete a pg_wal file
Previous Message David G. Johnston 2024-05-28 17:21:04 Re: BUG #18482: The first data after paging is inconsistent with the actual first data