From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | amitlangote09(at)gmail(dot)com |
Cc: | ashu(dot)coek88(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8 |
Date: | 2020-10-30 03:28:51 |
Message-ID: | 20201030.122851.538415294986124838.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Fri, 30 Oct 2020 12:08:51 +0900, Amit Langote <amitlangote09(at)gmail(dot)com> wrote in
> I noticed that the commit a8bd7e1c6e02 from ages ago removed
> conversions from and to utf-8's e28892, in favor of efbc8d, and that
> change has stuck. (Note though that these maps looked pretty
> different back then.)
>
> --- a/src/backend/utils/mb/Unicode/euc_jp_to_utf8.map
> +++ b/src/backend/utils/mb/Unicode/euc_jp_to_utf8.map
> - {0xa1dd, 0xe28892},
> + {0xa1dd, 0xefbc8d},
>
> --- a/src/backend/utils/mb/Unicode/utf8_to_euc_jp.map
> +++ b/src/backend/utils/mb/Unicode/utf8_to_euc_jp.map
> - {0xe28892, 0xa1dd},
> + {0xefbc8d, 0xa1dd},
>
> Can't tell what reason there was to do that, but there must have been
> some. Maybe the Japanese character sets prefer full-width hyphen
> minus (unicode U+FF0D) over mathematical minus sign (U+2212)?
It's a decsion made by Microsoft. Several other characters are in
similar issues. I remember many people complained but in the end that
wasn't "fixed" and led to the well-known conversion messes of Japanese
character conversion involving Unicode in Java.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2020-10-30 03:47:00 | Re: Enumize logical replication message actions |
Previous Message | Fujii Masao | 2020-10-30 03:25:10 | Re: Add Information during standby recovery conflicts |