AW: BUG #18196: Databases Created in Turkish Language Will Not Run on the Latest Version of Windows

From: Wilm Hoyer <W(dot)Hoyer(at)dental-vision(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "halilhanbadem(at)gmail(dot)com" <halilhanbadem(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: AW: BUG #18196: Databases Created in Turkish Language Will Not Run on the Latest Version of Windows
Date: 2023-11-21 07:37:46
Message-ID: b623950ee1a940e1a51a07f37ad338dd@dental-vision.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

>> On Mon, Nov 20, 2023 at 10:54 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
>> On Thu, 2023-11-16 at 11:54 +1300, Thomas Munro wrote:
> > https://www.postgresql.org/message-id/flat/CA%2BhUKGJ%3DXThErgAQRoqf
> > Cy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com
>
>> I don't have Windows to test, but wouldn't the problem be avoided if
>> people created their cluster with "--locale=tr-TR"? If yes, EDB's
>> Windows installer should be modified to use the correct locale names.
>> Is anybody from EDB reading this?

> That would have the same effect as that patch. I believe that is the right thing to do, but I am not sure about one detail: the lack of encoding on the end of the name. Does the encoding remain the same as the traditional one for that language, and stable, or when you haven't explicitly named it, might it depend on the registry/control panel "ACP", that could in theory change? If it does, what happens? Should we put it on the end to pin it down? When we tried that it had some effect, but didn't seem to have the expected [by me] effect on ctype (eg case conversion of the famous Turkish i); it appears that having ACP != LC_TYPE gives some Frankenstein behaviour, but I don't understand it, and I think you'd also want to determine in which cases
strcoll_l() is behaving sensibly with various combinations. This may all be pre-existing stuff well understood by people who worked on the Windows port? I don't have Windows to test.

> (How nice it would be to use ICU by default!)

My 5 cent on this:
If EDB is willing to make a 'breaking' change in their nice Windows Installer, I suggest changing the default locale to 'C' or its alias 'POSIX'.
At least for LC_COLLATE and LC_CTYPE.
That would be in line with the recommendation of https://www.postgresql.org/docs/current/locale.html. I believe this is the best setting for most use cases.

The breaking change would not be that hard, as with the current versions of PostgreSQL you can specify a different collation along CREATE DATABASE.
That means minimal effort on the user side, when you really need a database with language depending collation.
If someone want to retain the performance and stability benefit of C Collation, he can now even push the collation requirement to CREATE INDEX of the few indexes who needs a special collation or even further into the ORDER BY clause of the actual query (the latter would not be that performant, but probably ok in most circumstances).

Another Suggestion would be changing the default Data Directory to %ProgramData%\PostgreSQL\<Version>. Haven't rechecked, but in the past the suggested DataDir was something like %ProgramFiles%\PostgreSQL\<version>. The use of subdirectories from %ProgramFiles% for date changing on a daily basis is discouraged from Microsoft.

All of this does not "repair" existing databases like the solution Thomas Munro suggested upthread.

Best regards
Wilm.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Smith 2023-11-21 08:00:36 Re: BUG #18203: Logical Replication initial sync failure
Previous Message Laurenz Albe 2023-11-21 07:34:24 Re: BUG #18196: Databases Created in Turkish Language Will Not Run on the Latest Version of Windows