Quick Links

Re: [PATCHES] UNICODE characters above 0x10000

From:	Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To:	John Hansen <john(at)geeknet(dot)com(dot)au>
Cc:	Takehiko Abe <keke(at)mac(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PATCHES] UNICODE characters above 0x10000
Date:	2004-08-07 13:22:34
Message-ID:	Pine.LNX.4.44.0408071517040.9559-100000@zigo.dhs.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, 7 Aug 2004, John Hansen wrote:

> Now, is it really 24 bits tho?
> Afaict, it's really 21 (0 - 10FFFF or 0 - xxx10000 11111111 11111111)

Yes, up to 0x10ffff should be enough.

The 24 is not really important, this is all about what utf-8 strings to
accept as input. The strings are stored as utf-8 strings and when
processed inside pg it uses wchar_t that is 32 bit (on some systems at
least). By restricting the utf-8 input to unicode we can in the future
store each character as 3 bytes if we want.

--
/Dennis Björklund

In response to

Re: [PATCHES] UNICODE characters above 0x10000 at 2004-08-07 13:14:24 from John Hansen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jan Wieck	2004-08-07 13:34:57	Re: Vacuum Cost Documentation?
Previous Message	John Hansen	2004-08-07 13:14:24	Re: [PATCHES] UNICODE characters above 0x10000