From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
---|---|
To: | tgl(at)sss(dot)pgh(dot)pa(dot)us |
Cc: | koichi(dot)dbms(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows |
Date: | 2024-12-06 23:14:12 |
Message-ID: | 20241207.081412.2050532354647835961.ishii@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
> Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
>> I have looked into canonicalize_path() and found this:
>
>> if (*p == '\\')
>> *p = '/';
>
> Right, that's where the trouble is. It'd be easy enough to make
> that loop (and the similar one in cleanup_path) encoding-aware,
> if we knew what encoding applies. Deciding that is the sticky part.
>
> After sleeping on it, I'm coming around to the opinion that
> client_encoding (pset.encoding) is what to use in psql, for
> two reasons:
> * we already do our best to set that correctly, and the user
> is able to change it if it's wrong;
> * as previously noted, psqlscan.l will do the wrong things
> if it's not set correctly, so you're probably already hosed
> if working in a non-server-safe encoding with the wrong
> setting of client_encoding.
I think the encoding we need to supply to canonicalize_path() is not
necessarily the same as client_encoding. For example we could set
client_encoding to UTF-8 but use a file which has Shift-JIS encode
file name. I think what we really need to supply to
canonicalize_path() is the "file system encoding", not
client_encoding.
Among the file system encodings, the only problematic one is
Shift-JIS. As far as I know, currently there's no OS except Windows
which uses Shift-JIS as the file system encoding. So probably we can
safely assume that if the OS is Windows for Japanese, we can assume
that the file system encoding is Shift-JIS. If we know how to
determine the OS is Windows for Japanese inside the
canonicalize_path(), we don't need to change the API of it.
Quick gooling found this page (sorry, in Japanese)
https://tarenagashi.hatenablog.jp/entry/2023/07/17/160149
and it says:
- In Windows "system locale" represents the language/country used.
- The code for system locale is called "LCID" and it's 1041 (decimal)
for Japanese/Japan.
- There are some APIs to obtain LCID (GetSystemDefaultLocaleName etc.)
As I am not familiar with Windows and I cannot test these. Can someone
confirm?
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-12-07 00:07:41 | Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows |
Previous Message | Tom Lane | 2024-12-06 22:51:42 | Re: Dangling operator family after DROP TYPE |