From: | "Troy" <tjk(at)tksoft(dot)com> |
---|---|
To: | antti(dot)haapala(at)iki(dot)fi (Antti Haapala) |
Cc: | tjk(at)tksoft(dot)com (Troy K(dot)), postgre(at)totw(dot)org (JBJ), pgsql-sql(at)postgresql(dot)org |
Subject: | Re: once again, sorting with Unicode |
Date: | 2003-02-20 10:51:28 |
Message-ID: | 200302201051.h1KApSSN018184@tksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
You are right, of course. I was thinking in terms of the encoded
data. Applications usually get data in UTF8 or UTF16. If the
input data is true unicode, then there is no difference in
the byte values (just skip the 0x00 bytes).
Cheers,
Troy
>
>
> On Wed, 19 Feb 2003, Troy wrote:
>
> > > I have a multi-lingual database (currently 11 languages) which sorts
> > > fine in MySQL (8859-1 character set) I have now converted the data to
> > > Unicode and compiled Postgre with unicode support.
> > >
> > > I can select and insert unicode and so was rather pleased about that.
> > > Until I saw that it wasn't working properly when ordering!
> >
> > The cause for the different values is the fact that unicode characters
> > have different numeric values from ISO8859-1 and other encodings. Only
> > ascii values are in sync with unicode numeric values. This I am sure you
> > knew.
>
> No, ISO8859-1 maps directly to unicode up to U+00FF. So the actual
> _numeric_ values are the same. But actual byte patterns are encoding
> dependent.
>
> Have you set database encoding to UTF-8? Are you using proper UTF-8
> locales? POSIX compiled locales are often charset dependent.
>
> --
> Antti Haapala
>
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2003-02-20 11:08:23 | Re: VIEW or Stored Proc - Is this even possible? |
Previous Message | Troy | 2003-02-20 10:45:37 | Re: once again, sorting with Unicode |