Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows

From: Koichi Suzuki <koichi(dot)dbms(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows
Date: 2024-12-06 04:12:50
Message-ID: CABEZHFuqEkpoEf91VkK8A2Gbmcp0ELhYAYz_5J0SRCim=KJwWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello;

Very short response.

Lexical analysis of backshash commands in psql is handled
by psqlscanslash.l and this module scans iput byte-by-byte, not
character-by-character. I'm afraid that the cause of the bug is in
this part.. Is there any way to make this flex syntax local-dependent?

We need to analyze the behavior of this flex module to get practical idea
for fix.

Regards;
---
Koichi Suzuki
https://www.linkedin.com/in/koichidbms

2024年12月6日(金) 3:50 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> > Analysis:
> > * Latter byte valueof the character in question is same as '\'
> (backslash).
> > It looks that this byte value is handled as escape characters. This
> > happns SHIFT JIS client encoding.
> > * The issue happens in \i, \ir and \copy but does not happen in \cd, \o
> and
> > \! command.
>
> I imagine what is happening here is that canonicalize_path() interprets
> the backslash bytes as directory separators.
>
> The only thing I can think of to improve that is to make
> canonicalize_path() encoding-aware and have it skip over multibyte
> characters. Unfortunately, I fear that would introduce as many
> misbehaviors as it would remove, because we don't always know the
> relevant encoding. We might be able to limit the hazard by
> confining the encoding-awareness to the initial Windows-only
> conversion of '\' to '/', but it'd still be pretty squishy.
>
> > * The similar issue may happen if the latter byte value of a multibyte
> > character is same as '/' (directory delimiter).
>
> I don't believe Shift-JIS uses '/' as part of multibyte characters,
> so it should be sufficient to consider '\'.
>
> BTW, according to wikipedia[1], backslash is not even part of the
> Shift-JIS character set:
>
> The single-byte characters 0x00 to 0x7F match the ASCII encoding,
> except for a yen sign (U+00A5) at 0x5C and an overline (U+203E) at
> 0x7E in place of the ASCII character set's backslash and tilde
> respectively (these deviations from ASCII align with JIS X
> 0201). The single-byte characters from 0xA1 to 0xDF map to the
> half-width katakana characters found in JIS X 0201.
>
> For double-byte characters, the first byte is always in the range
> 0x81 to 0x9F or the range 0xE0 to 0xEF (these ranges are
> unassigned in JIS X 0201). If the first byte is odd, the second
> byte must be in the range 0x40 to 0x9E (but cannot be 0x7F); if
> the first byte is even, the second byte must in the range 0x9F to
> 0xFC.
>
> This might mean that it'd be okay to just skip the backslash-to-slash
> conversion loops altogether if we think the encoding is Shift-JIS.
>
> There's still the question of how we determine the relevant encoding.
> I don't think client_encoding is what to use (and we won't have that
> at hand anyway, in programs other than psql). What we want to know
> is what fopen and related system calls will do with the path: they
> must have different behavior for Shift-JIS than other encodings,
> else none of your examples could work at all. I assume there's
> a way to find out what they think the relevant encoding is.
>
> make_native_path() adds even more fun: when should we convert '/'
> back to '\'? From the comments, this function is concerned with
> producing something that will be accepted as a command-line
> argument by other programs, so I wonder if we can even know what
> to do with any certainty.
>
> (In case it's not clear, I'm not volunteering to write or test
> any of this.)
>
> regards, tom lane
>
> [1] https://en.wikipedia.org/wiki/Shift_JIS
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-12-06 05:05:00 Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows
Previous Message David Rowley 2024-12-05 20:21:07 Re: [Bug] Heap Use After Free in Window Aggregate Execution