Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: koichi(dot)dbms(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows
Date: 2024-12-06 23:14:12
Message-ID: 20241207.081412.2050532354647835961.ishii@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
>> I have looked into canonicalize_path() and found this:
>
>> if (*p == '\\')
>> *p = '/';
>
> Right, that's where the trouble is. It'd be easy enough to make
> that loop (and the similar one in cleanup_path) encoding-aware,
> if we knew what encoding applies. Deciding that is the sticky part.
>
> After sleeping on it, I'm coming around to the opinion that
> client_encoding (pset.encoding) is what to use in psql, for
> two reasons:
> * we already do our best to set that correctly, and the user
> is able to change it if it's wrong;
> * as previously noted, psqlscan.l will do the wrong things
> if it's not set correctly, so you're probably already hosed
> if working in a non-server-safe encoding with the wrong
> setting of client_encoding.

I think the encoding we need to supply to canonicalize_path() is not
necessarily the same as client_encoding. For example we could set
client_encoding to UTF-8 but use a file which has Shift-JIS encode
file name. I think what we really need to supply to
canonicalize_path() is the "file system encoding", not
client_encoding.

Among the file system encodings, the only problematic one is
Shift-JIS. As far as I know, currently there's no OS except Windows
which uses Shift-JIS as the file system encoding. So probably we can
safely assume that if the OS is Windows for Japanese, we can assume
that the file system encoding is Shift-JIS. If we know how to
determine the OS is Windows for Japanese inside the
canonicalize_path(), we don't need to change the API of it.

Quick gooling found this page (sorry, in Japanese)
https://tarenagashi.hatenablog.jp/entry/2023/07/17/160149
and it says:

- In Windows "system locale" represents the language/country used.

- The code for system locale is called "LCID" and it's 1041 (decimal)
for Japanese/Japan.

- There are some APIs to obtain LCID (GetSystemDefaultLocaleName etc.)

As I am not familiar with Windows and I cannot test these. Can someone
confirm?

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-12-07 00:07:41 Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows
Previous Message Tom Lane 2024-12-06 22:51:42 Re: Dangling operator family after DROP TYPE