Gregory Stark wrote:
> It's limited but I wouldn't say it's very limiting. In the cases where it
> doesn't apply there's no way out anyways. A UTF8 field will need a length
> header in some form.
Actually, you can determine the length of a UTF-8 encoded character by
looking at the most significant bits of the first byte. So we could
store a UTF-8 encoded CHAR(1) field without any additional length header.
See http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 for the bit patterns.
AFAIK, UTF-16 works similarly.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com