From: | Hannu Krosing <hannu(at)tm(dot)ee> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Oliver Elphick <olly(at)lfix(dot)co(dot)uk>, PostgreSQL hackers list <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Unicode upper() bug still present |
Date: | 2003-10-20 07:22:29 |
Message-ID: | 1066634549.15789.20.camel@fuji.krosing.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane kirjutas E, 20.10.2003 kell 03:35:
> Oliver Elphick <olly(at)lfix(dot)co(dot)uk> writes:
> > There is a bug in Unicode upper() which has been present since 7.2:
>
> We don't support upper/lower in multibyte character sets, and can't as
> long as the functionality is dependent on <ctype.h>'s toupper()/tolower().
> It's been suggested that we could use <wctype.h> where available.
> However there are a bunch of issues that would have to be solved to make
> that happen. (How do we convert between the database character encoding
> and the wctype representation?
How do we do it for sorting ?
> How do we even find out what
> representation the current locale setting expects to use?)
Why not use the same locale settings as for sorting (i.e. databse
encoding) until we have a proper multi-locale support in the backend ?
It seems inconsistent that we do use locale-aware sorts but not
upper/lower.
this is for UNICODE database using locale et_EE.UTF-8
ucdb=# select t, upper(t) from tt order by 1;
t | upper
---+-------
a | A
s | S
Š | Š
š | š
Õ | Õ
õ | õ
Ä | Ä
ä | ä
(8 rows)
as you see, the sort order is right, but "some" characters are and some
are not converted the result is a complete mess ;(
-------------------
Hannu
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien DAUMEN | 2003-10-20 08:12:28 | Can't not load libpq.so.3 |
Previous Message | Peter Eisentraut | 2003-10-20 05:29:09 | Re: A couple of TODO notes |