From: | mark(at)mark(dot)mielke(dot)cc |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Gregory Stark <gsstark(at)mit(dot)edu>, andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Fixed length data types issue |
Date: | 2006-09-08 20:31:23 |
Message-ID: | 20060908203123.GA16397@mark.mielke.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote:
> mark(at)mark(dot)mielke(dot)cc wrote:
> > I think I've been involved in a discussion like this in the past. Was
> > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding
> > means that UTF-8 applications are at a disadvantage when using the
> > library. UTF-16 is considered more efficient to work with for everybody
> > except ASCII users. :-)
> Uh, is it? By whom? And why?
The authors of the library in question? Java? Anybody whose primary
alphabet isn't LATIN1 based? :-)
Only ASCII values store more space efficiently in UTF-8. All values
over 127 store more space efficiently using UTF-16. UTF-16 is easier
to process. UTF-8 requires too many bit checks with single character
offsets. I'm not an expert - I had this question before a year or two
ago, and read up on the ideas of experts.
Cheers,
mark
--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2006-09-08 20:42:09 | Re: Fixed length data types issue |
Previous Message | Tom Lane | 2006-09-08 20:17:40 | Re: Proposal for GUID datatype |