From: | Greg Stark <stark(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, - - <crossroads0000(at)googlemail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Unicode support |
Date: | 2009-04-13 20:26:20 |
Message-ID: | 4136ffa0904131326u5ede7272yadd838cf7426b75a@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Apr 13, 2009 at 9:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> This isn't about the number of bytes, but about whether or not we should
>> count characters encoded as two or more combined code points as a single
>> char or not.
>
> It's really about whether we should support non-canonical encodings.
> AFAIK that's a hack to cope with implementations that are restricted
> to UTF-16, and we should Just Say No. Clients that are sending these
> things converted to UTF-8 are in violation of the standard.
Is it really true trhat canonical encodings never contain any composed
characters in them? I thought there were some glyphs which could only
be represented by composed characters.
Also, users can construct strings of unicode code points themselves in
SQL using || or other text operators.
That said, my impression is that composed character support is pretty
thin on the ground elsewhere as well, but I don't have much first-hand
experience.
The original post seemed to be a contrived attempt to say "you should
use ICU". If composed character support were a show-stopper and there
was no other way to get it then it might be convincing, but I don't
know that it is and I don't know that ICU is the only place to get it.
And I'm sure it's not the only way to handle multiple encodings in a
database.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-04-13 20:39:44 | Re: Unicode support |
Previous Message | Tom Lane | 2009-04-13 20:15:44 | Re: Unicode support |