From: | "John Hansen" <john(at)geeknet(dot)com(dot)au> |
---|---|
To: | "Hackers" <pgsql-hackers(at)postgresql(dot)org> |
Cc: | "Patches" <pgsql-patches(at)postgresql(dot)org> |
Subject: | Re: UNICODE characters above 0x10000 |
Date: | 2004-08-07 10:56:03 |
Message-ID: | 5066E5A966339E42AA04BA10BA706AE5608C@rodrick.geeknet.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
4 actually,
10FFFF needs four bytes:
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
10FFFF = 00001010 11111111 11111111
Fill in the blanks, starting from the bottom, you get:
11110000 10101111 10111111 10111111
Regards,
John Hansen
-----Original Message-----
From: Christopher Kings-Lynne [mailto:chriskl(at)familyhealth(dot)com(dot)au]
Sent: Saturday, August 07, 2004 8:47 PM
To: Tom Lane
Cc: John Hansen; Hackers; Patches
Subject: Re: [HACKERS] UNICODE characters above 0x10000
> Now it's entirely possible that the underlying support is a few bricks
> shy of a load --- for instance I see that pg_utf_mblen thinks there
> are no UTF8 codes longer than 3 bytes whereas your code goes to 4.
> I'm not an expert on this stuff, so I don't know what the UTF8 spec
> actually says. But I do think you are fixing the code at the wrong
level.
Surely there are UTF-8 codes that are at least 3 bytes. I have a
_vague_ recollection that you have to keep escaping and escaping to get
up to like 4 bytes for some asian code points?
Chris
From | Date | Subject | |
---|---|---|---|
Next Message | Dennis Bjorklund | 2004-08-07 11:05:44 | Re: [PATCHES] UNICODE characters above 0x10000 |
Previous Message | Christopher Kings-Lynne | 2004-08-07 10:47:07 | Re: UNICODE characters above 0x10000 |
From | Date | Subject | |
---|---|---|---|
Next Message | Dennis Bjorklund | 2004-08-07 11:05:44 | Re: [PATCHES] UNICODE characters above 0x10000 |
Previous Message | Christopher Kings-Lynne | 2004-08-07 10:47:07 | Re: UNICODE characters above 0x10000 |