From: | Michael Robinson <robinson(at)netrinsics(dot)com> |
---|---|
To: | robinson(at)netrinsics(dot)com, t-ishii(at)sra(dot)co(dot)jp |
Cc: | pgsql-hackers(at)hub(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us |
Subject: | Re: Multibyte still broken |
Date: | 2000-05-11 17:56:15 |
Message-ID: | 200005111756.BAA10220@netrinsics.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
>I am supprised to hear that you have so poor quality tools that
>produce illegal code sequences of Simplified Chinese. In Japan, as far
>as I know, we never have such a low quality tools which generate
>illegal Japanese charaters just because they are not accepted in the
>market, even in the case of email attachments, or cut-and-past or
>whatever.
The problem is not that the tools produce "illegal characters". The problem
is that, as an EUC code, GB permits the coexistance of standard ascii
characters with double-byte hanzi characters. Furthermore, most Chinese
software is an operating-system "hack" on top of English-language software
based on a Latin-1 character set (the Chinese software market is underserved
compared to Japan, so we have to cope as best we can).
The result is that it is possible to, for example, insert a carriage return
or ASCII comma into the middle of a hanzi, which breaks the alignment for all
the hanzi on the rest of the line. It's also possible, in non-native Chinese
applications, to select one byte of a hanzi character in a cut or copy
operation.
So the problem is that the tools do not uniformly respect the integrity of
a double-byte hanzi character, but rather treat it as two individual Latin-1
characters.
The important point, though, is that all tools, whether native Chinese or
"hacked" English, accept the resulting invalid code sequences consistently,
robustly, and without complaint.
-Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2000-05-11 18:02:07 | Re: Multibyte still broken |
Previous Message | Marc G. Fournier | 2000-05-11 17:54:56 | Re: User's Lounge and Developer's Corner |