From: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
---|---|
To: | tgl(at)sss(dot)pgh(dot)pa(dot)us |
Cc: | parker(dot)han(at)outlook(dot)com, pgsql-general(at)postgresql(dot)org |
Subject: | Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered? |
Date: | 2020-10-06 03:11:42 |
Message-ID: | 20201006.121142.2002518154310370203.t-ishii@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> But as he already admitted, actually GB18030 is 4 byte encoding, rather
> than 2 bytes. So maybe we could find a way to map original GB18030 to
> ASCII-safe GB18030 using 4 bytes.
Here is an idea (in-byte represents GB18030, out-byte represents
internal server encoding):
if (in-byte1 is 0x00-80) /* ASCII */
out-byte1 = in-byte1
else if (in-byte1 is 0x81-0xfe && in-byte2 is 0x40-0x7f) /* 2 bytes GB18030 */
out-byte1 = in-byte1
out-byte2 = 0x80
out-byte3 = in-byte2 + 0x80 (should be 0xc0-0xc9)
out-byte4 = 0x80
else if (in-byte1 is 0x81-0xfe && in-byte2 is 0x80-0xfe) /* 2 bytes GB18030 */
out-byte1 = in-byte1
out-byte2 = 0x80
out-byte3 = 0x80
out-byte4 = in-byte2 (should be 0x80-0xfe)
else if (in-byte1 is 0x81-0xfe && in-byte2 is 0x30-0x39) /* 4 bytes GB18030 */
out-byte1 = in-byte1
out-byte2 = in-byte2 + 0x80 (should be 0xb0-0xb9)
out-byte3 = in-byte3
out-byte4 = in-byte4 + 0x80 (should be 0xb0-0xb9)
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
From | Date | Subject | |
---|---|---|---|
Next Message | Han Parker | 2020-10-06 03:13:06 | 回复: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered? |
Previous Message | Tatsuo Ishii | 2020-10-06 02:15:35 | Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered? |