From: | Marc Herbert <Marc(dot)Herbert(at)continuent(dot)com> |
---|---|
To: | pgsql-odbc(at)postgresql(dot)org |
Subject: | Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text |
Date: | 2006-03-31 19:02:38 |
Message-ID: | khjsloyigj5.fsf@meije.emic.fr |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-odbc |
Johann Zuschlag <zuschlag2(at)online(dot)de> writes:
> Hmm..., so Windows XP uses UCS-2 or do be more correct (like Bart
> mentioned) UTF-16 (which is nearly the same, except for the
> surrogates).
It's nearly the same... but that makes a huge difference.
The reason why you use fixed-character length encoding in memory is
speed. This saves you a lot of time when computing string lengths,
look for some characters (isalnum(),...), collating etc.
If don't care about all this speed then you better stay in a
variable-length encoding like UTF-8 which saves you A LOT of space,
especially with small occidental alphabets.
I think that by moving from UCS-2 to UTF-16 you lose on BOTH sides
[insert some missing benchmarks here]
And you can be sure that it brings a lot of bugs: one bug every
time some string code has been "forgotten" and not updated, still
assuming UCS-2.
Anyway those bugs are only for far-away and unknown countries out of
the BMP so who cares? :-/
So it really looks like a poor compatibility hack to me (java does it
too).
From | Date | Subject | |
---|---|---|---|
Next Message | Marc Herbert | 2006-03-31 19:12:13 | Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text |
Previous Message | Marc Herbert | 2006-03-31 18:47:05 | Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text |