Re: Windows UTF8 system locale

From: Noah Misch <noah(at)leadboat(dot)com>
To: Vladlen Popolitov <v(dot)popolitov(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Windows UTF8 system locale
Date: 2025-01-02 04:26:34
Message-ID: 20250102042634.b5.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 25, 2024 at 06:55:51PM +0300, Vladlen Popolitov wrote:
> This UTF-8 feature leads to annoying test failure
> (010_dump_connstr).

It's not merely an annoying test failure. On Windows configured with a
multibyte system locale, anyone with CREATEDB privilege can name a database
such that pg_dumpall can't restore it.

> Option 1
> Skip this test for Windows in UTF-8 mode.
>
> Option 2.
> Exclude all 8-bit characters for Windows in UTF-8 mode. Now only " excluded
> for Windows.
>
> Option 3.
> Test with some limited list of correct UTF-8 symbols - just in case, that
> they also works.
> It could be 64 2-bytes UTF-8 characters.

Those are ways to suppress the test failure. But we have that test because
pg_dumpall and pg_upgrade rely on the ability to send all possible rolname and
datname on the command line. In a cluster that uses a single-byte encoding,
that requires the ability to pass every sequence of bytes [0x01,0xFF]. It's
not much of a win to make the test stop failing if real use of pg_dump and
pg_upgrade would still fail. Message
postgr.es/m/20241215023221.4d.nmisch@google.com (original post of this thread)
gave PGSERVICEFILE as a way to make the real usage work. That works by
removing the requirement to pass arbitrary bytes in command lines. The
command line would contain an ASCII-only service name, and the arbitrary bytes
would appear inside the service file.

Another way might be to create the objects with placeholder ASCII names. As
the last step of the restore, rename the placeholder ASCII names to the source
cluster's names.

Once we can assume Windows 11 or later, another way is
<activeCodePage>en-US</activeCodePage> in a fusion manifest, per
https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activeCodePage.
Any single-byte encoding choice might suffice. That makes PostgreSQL
independent of the system locale.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2025-01-02 06:16:21 Re: Document How Commit Handles Aborted Transactions
Previous Message Peter Smith 2025-01-02 02:46:20 Re: Introduce XID age and inactive timeout based replication slot invalidation