From: | "John Hansen" <john(at)geeknet(dot)com(dot)au> |
---|---|
To: | "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp>, <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | <db(at)zigo(dot)dhs(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, <pgsql-patches(at)postgresql(dot)org> |
Subject: | Re: [PATCHES] UNICODE characters above 0x10000 |
Date: | 2004-08-07 10:11:27 |
Message-ID: | 5066E5A966339E42AA04BA10BA706AE5608A@rodrick.geeknet.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Yes, but the specification allows for 6byte sequences, or 32bit
characters.
As dennis pointed out, just because they're not used, doesn't mean we
should not allow them to be stored, since there might me someone using
the high ranges for a private character set, which could very well be
included in the specification some day.
Regards,
John Hansen
-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii(at)sra(dot)co(dot)jp]
Sent: Saturday, August 07, 2004 8:09 PM
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: db(at)zigo(dot)dhs(dot)org; John Hansen; pgsql-hackers(at)postgresql(dot)org;
pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] [HACKERS] UNICODE characters above 0x10000
> Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> writes:
> > ... This also means that the start byte can never start with 7 or 8
> > ones, that is illegal and should be tested for and rejected. So the
> > longest utf-8 sequence is 6 bytes (and the longest character needs 4
> > bytes (or 31 bits)).
>
> Tatsuo would know more about this than me, but it looks from here like
> our coding was originally designed to support only 16-bit-wide
> internal characters (ie, 16-bit pg_wchar datatype width). I believe
> that the regex library limitation here is gone, and that as far as
> that library is concerned we could assume a 32-bit internal character
> width. The question at hand is whether we can support 32-bit
> characters or not --- and if not, what's the next bug to fix?
pg_wchar has been already 32-bit datatype. However I doubt there's
actually a need for 32-but width character sets. Even Unicode only uese
up 0x0010FFFF, so 24-bit should be enough...
--
Tatsuo Ishii
From | Date | Subject | |
---|---|---|---|
Next Message | Gaetano Mendola | 2004-08-07 10:38:10 | Re: CVS comment |
Previous Message | Tatsuo Ishii | 2004-08-07 10:09:13 | Re: [PATCHES] UNICODE characters above 0x10000 |
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2004-08-07 10:46:16 | Re: [PATCHES] UNICODE characters above 0x10000 |
Previous Message | Tatsuo Ishii | 2004-08-07 10:09:13 | Re: [PATCHES] UNICODE characters above 0x10000 |