Quick Links

Re: [PATCHES] UNICODE characters above 0x10000

From:	Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	db(at)zigo(dot)dhs(dot)org, john(at)geeknet(dot)com(dot)au, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject:	Re: [PATCHES] UNICODE characters above 0x10000
Date:	2004-08-07 10:09:13
Message-ID:	20040807.190913.26271342.t-ishii@sra.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

> Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> writes:
> > ... This also means that the start byte can never start with 7 or 8
> > ones, that is illegal and should be tested for and rejected. So the
> > longest utf-8 sequence is 6 bytes (and the longest character needs 4
> > bytes (or 31 bits)).
>
> Tatsuo would know more about this than me, but it looks from here like
> our coding was originally designed to support only 16-bit-wide internal
> characters (ie, 16-bit pg_wchar datatype width). I believe that the
> regex library limitation here is gone, and that as far as that library
> is concerned we could assume a 32-bit internal character width. The
> question at hand is whether we can support 32-bit characters or not ---
> and if not, what's the next bug to fix?

pg_wchar has been already 32-bit datatype. However I doubt there's
actually a need for 32-but width character sets. Even Unicode only
uese up 0x0010FFFF, so 24-bit should be enough...
--
Tatsuo Ishii

In response to

Re: UNICODE characters above 0x10000 at 2004-08-07 06:49:06 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	John Hansen	2004-08-07 10:11:27	Re: [PATCHES] UNICODE characters above 0x10000
Previous Message	Christopher Kings-Lynne	2004-08-07 10:02:33	Re: pg_dump: could not parse ACL list

Browse pgsql-patches by date

	From	Date	Subject
Next Message	John Hansen	2004-08-07 10:11:27	Re: [PATCHES] UNICODE characters above 0x10000
Previous Message	Andreas Pflug	2004-08-07 09:58:44	Re: make fails if path has spaces