Re: [PATCHES] UNICODE characters above 0x10000

From: "John Hansen" <john(at)geeknet(dot)com(dot)au>
To: "Dennis Bjorklund" <db(at)zigo(dot)dhs(dot)org>, "Takehiko Abe" <keke(at)mac(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] UNICODE characters above 0x10000
Date: 2004-08-07 13:14:24
Message-ID: 5066E5A966339E42AA04BA10BA706AE56172@rodrick.geeknet.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of
> Dennis Bjorklund
> Sent: Saturday, August 07, 2004 10:48 PM
> To: Takehiko Abe
> Cc: pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [PATCHES] [HACKERS] UNICODE characters above 0x10000
>
> On Sat, 7 Aug 2004, Takehiko Abe wrote:
>
> It looked like you sent the last mail only to me and not the
> list. I assume it was a misstake and I send the reply to both.
>
> > > Is there a specific reason you want to restrict it to 24 bits?
> >
> > ISO 10646 is said to have removed its private use codepoints outside
> > of the Unicode 0 - 10FFFF range to ensure the compatibility with Unicode.
> >
> > see Section C.2 and C.3 of Unicode 4.0 Appendix C
> "Relationship to ISO
> > 10646": <http://www.unicode.org/versions/Unicode4.0.0/appC.pdf>.
>
> The one and only reason for allowing 31 bit is that it's
> defined by iso 10646. In practice there is probably no one
> that uses the upper part of
> 10646 so not supporting it will most likely not hurt anyone.
>
>
> I'm happy either way so I will put my voice on letting PG use
> unicode (not ISO 10646) and restrict it to 24 bits. By the
> time someone wants (if ever) iso 10646 we probably have
> support for different charsets and can easily handle both at
> the same time.
>

Point taken.
Since we're supporting UTF8, and not ISO 10646.

Now, is it really 24 bits tho?
Afaict, it's really 21 (0 - 10FFFF or 0 - xxx10000 11111111 11111111)

This would require that we suport 4 byte sequences
(11110100 10001111 10111111 10111111 = 10FFFF)

> --
> /Dennis Björklund
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
>

Regards,

John Hansen

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dennis Bjorklund 2004-08-07 13:22:34 Re: [PATCHES] UNICODE characters above 0x10000
Previous Message Dennis Bjorklund 2004-08-07 12:47:43 Re: [PATCHES] UNICODE characters above 0x10000