Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered?

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: parker(dot)han(at)outlook(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered?
Date: 2020-10-06 00:04:10
Message-ID: 20201006.090410.1264557007598736613.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> TBH, even if you came up with a complete patch, we'd probably
> reject it as unmaintainable and a security hazard. The problem
> is that code may scan a string looking for certain ASCII characters
> such as backslash (\), which up to now it's always been able to do
> byte-by-byte without fear that non-ASCII characters could confuse it.
> To support GB18030 (or other encodings with the same issue, such as
> SJIS), every such loop would have to be modified to advance character
> by character, thus roughly "p += pg_mblen(p)" instead of "p++".
> Anyplace that neglected to do that would have a bug --- one that
> could only be exposed by careful testing using GB18030 encoding.
> What's more, such bugs could easily be security problems.
> Mis-detecting a backslash, for example, could lead to wrong decisions
> about where string literals end, allowing SQL-injection exploits.

One of ideas to avoid the concern could be "shifting" GB18030 code
points into "ASCII safe" code range with some calculations so that
backend can handle them without worrying about the concern above. This
way, we could avoid a table lookup overhead which is necessary in
conversion between GB18030 and UTF8 and so on.

However I don't come up with such a mathematical conversion method for
now.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2020-10-06 00:58:42 Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered?
Previous Message David G. Johnston 2020-10-05 20:33:03 Re: UUID generation problem