From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Gregory Stark <stark(at)enterprisedb(dot)com> |
Subject: | Re: Unicode support |
Date: | 2009-04-14 12:36:35 |
Message-ID: | 200904141536.35866.peter_e@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tuesday 14 April 2009 07:07:27 Andrew Gierth wrote:
> FWIW, the SQL spec puts the onus of normalization squarely on the
> application; the database is allowed to assume that Unicode strings
> are already normalized, is allowed to behave in implementation-defined
> ways when presented with strings that aren't normalized, and provision
> of normalization functions and predicates is just another optional
> feature.
Can you name chapter and verse on that?
I see this, for example,
6.27 <numeric value function>
5) If a <char length expression> is specified, then
Case:
a) If the character encoding form of <character value expression> is not UTF8,
UTF16, or UTF32, then let S be the <string value expression>.
Case:
i)
If the most specific type of S is character string, then the result is the
number of characters in the value of S.
NOTE 134 — The number of characters in a character string is determined
according to the semantics of the character set of that character string.
ii)
Otherwise, the result is OCTET_LENGTH(S).
b) Otherwise, the result is the number of explicit or implicit <char length
units> in <char length expression>, counted in accordance with the definition
of those units in the relevant normatively referenced document.
So SQL redirects the question of character length the Unicode standard. I
have not been able to find anything there on a quick look, but I'm sure the
Unicode standard has some very specific ideas on this. Note that the matter
of normalization is not mentioned here.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2009-04-14 12:53:52 | Re: Unicode string literals versus the world |
Previous Message | Peter Eisentraut | 2009-04-14 12:32:44 | Re: Unicode support |