From: | Andrew - Supernews <andrew+nonews(at)supernews(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Unicode problems on IRC |
Date: | 2005-04-11 03:41:53 |
Message-ID: | slrnd5jsg1.2ilg.andrew+nonews@trinity.supernews.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2005-04-10, "John Hansen" <john(at)geeknet(dot)com(dot)au> wrote:
> That's right, dono how I missed that one, but looks correct to me, and
> is in line with the code in ConvertUTF.c from unicode.org, on which I
> based the patch, extended to support 6 byte utf8 characters.
Frankly, you should probably de-extend it back down to 4 bytes. That's
enough to encode the Unicode range of 0x000000 - 0x10FFFF, and enough
other stuff would break if anyone allocated a character outside that
range that I don't think it it worth worrying about. (Even the ISO
people have agreed to conform to that limitation.) Even if insanity
struck simultaneously at both standards bodies, 4 bytes is enough to
go to 0x1FFFFF so there is still substantial slack. (A number of other
specifications based on utf-8 have removed the 5 and 6 byte sequences
too, so there is substantial precedent for this.)
--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2005-04-11 04:57:50 | Question regarding clock-sweep |
Previous Message | Bruno Wolff III | 2005-04-11 02:33:41 | Re: Case Sensitivity |