From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
Cc: | koichi(dot)dbms(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows |
Date: | 2024-12-06 18:44:24 |
Message-ID: | 2840430.1733510664@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
> I have looked into canonicalize_path() and found this:
> if (*p == '\\')
> *p = '/';
Right, that's where the trouble is. It'd be easy enough to make
that loop (and the similar one in cleanup_path) encoding-aware,
if we knew what encoding applies. Deciding that is the sticky part.
After sleeping on it, I'm coming around to the opinion that
client_encoding (pset.encoding) is what to use in psql, for
two reasons:
* we already do our best to set that correctly, and the user
is able to change it if it's wrong;
* as previously noted, psqlscan.l will do the wrong things
if it's not set correctly, so you're probably already hosed
if working in a non-server-safe encoding with the wrong
setting of client_encoding.
However, there are a bunch of callers of canonicalize_path()
that are not in psql, and those arguments don't apply to them;
in fact places like initdb and pg_ctl don't really have a
concept of client encoding at all. So what to do?
After looking through the callers I think we might not be in as bad
shape as this sounds, because all of the other callers are dealing
with Postgres installation paths or data directory-related paths that
are also dealt with by the server. So it's not unreasonable to
require that those paths must be written in server-safe encodings.
If they're not, you're going to have trouble with stuff like
"show data_directory".
I wonder whether we ought to try to enforce that. It'd be feasible
I think for initdb to verify that the selected paths are validly
encoded according to whatever encoding it's about to set the server
up with. If we were feeling draconian we could insist that
the installation path and data directory path be all-ASCII, which
is the only way to be sure that you won't have issues if you later
create a database that uses some other encoding. But I think we'd
likely get pushback from that. (This ties into the nearby
discussion about encoding of shared-catalog names [1], which is
more or less the same problem --- maybe the path encoding checks
could vary depending on how we're setting that up?)
Anyway, what I'm now thinking is that we can have two variants
of canonicalize_path:
extern void canonicalize_path(char *path);
extern void canonicalize_path_enc(char *path, int encoding);
The first one assumes a server-safe encoding, the second doesn't,
and at least to start with only psql would bother with the second.
It looks like we don't need cleanup_path_enc, not yet anyway,
since that's only applied to installation paths.
I am also guessing that we don't need an encoding-aware variant
of make_native_path: since it only changes '/' it can't create
an incorrectly encoded path, assuming the input is OK. However,
this is assuming that it's okay to use '\' as a Windows directory
separator even in shift-JIS, which I'm not too sure about.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-12-06 20:14:37 | Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows |
Previous Message | Peter Geoghegan | 2024-12-06 18:36:27 | Re: Dangling operator family after DROP TYPE |