From: | "John Hansen" <john(at)geeknet(dot)com(dot)au> |
---|---|
To: | "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp> |
Cc: | <db(at)zigo(dot)dhs(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, <pgsql-patches(at)postgresql(dot)org> |
Subject: | Re: [PATCHES] UNICODE characters above 0x10000 |
Date: | 2004-08-07 11:09:22 |
Message-ID: | 5066E5A966339E42AA04BA10BA706AE5608D@rodrick.geeknet.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Well, maybe we'd be better off, compiling a list of (in?)valid ranges
from the full unicode database
(http://www.unicode.org/Public/UNIDATA/UnicodeData.txt and
http://www.unicode.org/Public/UNIDATA/Unihan.txt)
and with every release of pg, update the detection logic so only valid
characters are allowed?
Regards,
John Hansen
-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii(at)sra(dot)co(dot)jp]
Sent: Saturday, August 07, 2004 8:46 PM
To: John Hansen
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us; db(at)zigo(dot)dhs(dot)org; pgsql-hackers(at)postgresql(dot)org;
pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] [HACKERS] UNICODE characters above 0x10000
> Yes, but the specification allows for 6byte sequences, or 32bit
> characters.
UTF-8 is just an encoding specification, not character set
specification. Unicode only has 17 256x256 planes in its specification.
> As dennis pointed out, just because they're not used, doesn't mean we
> should not allow them to be stored, since there might me someone using
> the high ranges for a private character set, which could very well be
> included in the specification some day.
We should expand it to 64-bit since some day the specification might be
changed then:-)
More seriously, Unicode is filled with tons of confusion and
inconsistency IMO. Remember that once Unicode adovocates said that the
merit of Unicode was it only requires 16-bit width. Now they say they
need surrogate pairs and 32-bit width chars...
Anyway my point is if current specification of Unicode only allows
24-bit range, why we need to allow usage against the specification?
--
Tatsuo Ishii
From | Date | Subject | |
---|---|---|---|
Next Message | John Hansen | 2004-08-07 11:10:53 | Re: [PATCHES] UNICODE characters above 0x10000 |
Previous Message | Dennis Bjorklund | 2004-08-07 11:05:44 | Re: [PATCHES] UNICODE characters above 0x10000 |
From | Date | Subject | |
---|---|---|---|
Next Message | John Hansen | 2004-08-07 11:10:53 | Re: [PATCHES] UNICODE characters above 0x10000 |
Previous Message | Dennis Bjorklund | 2004-08-07 11:05:44 | Re: [PATCHES] UNICODE characters above 0x10000 |