From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
---|---|
To: | arul(at)fast(dot)au(dot)fujitsu(dot)com |
Cc: | robertmhaas(at)gmail(dot)com, pavel(dot)stehule(at)gmail(dot)com, peter_e(at)gmx(dot)net, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Proposal - Support for National Characters functionality |
Date: | 2013-07-15 07:58:49 |
Message-ID: | 20130715.165849.1163237908590959703.t-ishii@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>> On Fri, Jul 5, 2013 at 2:35 PM, Pavel Stehule
> <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> > Yes, what I know almost all use utf8 without problems. Long time I
>> > didn't see any request for multi encoding support.
>>
>> Well, not *everything* can be represented as UTF-8; I think this is
>> particularly an issue with Asian languages.
>>
>> If we chose to do it, I think that per-column encoding support would
> end up
>> looking a lot like per-column collation support: it would be yet
> another per-
>> column property along with typoid, typmod, and typcollation. I'm not
> entirely
>> sure it's worth it, although FWIW I do believe Oracle has something
> like this.
>
> Yes, the idea is that users will be able to declare columns of type
> NCHAR or NVARCHAR which will use the pre-determined encoding type. If we
> say that NCHAR is UTF-8 then the NCHAR column will be of UTF-8 encoding
> irrespective of the database encoding. It will be up to us to restrict
> what Unicode encodings we want to support for NCHAR/NVARCHAR columns.
> This is based on my interpretation of the SQL standard. As you allude to
> above, Oracle has a similar behaviour (they support UTF-16 as well).
>
> Support for UTF-16 will be difficult without linking with some external
> libraries such as ICU.
Can you please elaborate more on this? Why do you exactly need ICU?
Also I don't understand why you need UTF-16 support as a database
encoding because UTF-8 and UTF-16 are logically equivalent, they are
just different represention (encoding) of Unicode. That means if we
already support UTF-8 (I'm sure we already do), there's no particular
reason we need to add UTF-16 support.
Maybe you just want to support UTF-16 as a client encoding?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2013-07-15 08:10:30 | Re: Proposal - Support for National Characters functionality |
Previous Message | Jeevan Chalke | 2013-07-15 06:59:28 | Re: Regex pattern with shorter back reference does NOT work as expected |