| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | Alexey Mahotkin <alexm(at)w-m(dot)ru> | 
| Cc: | pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: UPPER()/LOWER() and UTF-8 | 
| Date: | 2003-11-05 14:42:21 | 
| Message-ID: | 1701.1068043341@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Alexey Mahotkin <alexm(at)w-m(dot)ru> writes:
>     TL> upper/lower aren't going to work desirably in any multi-byte
>     TL> character set encoding.  
> Can you please point me at their implementation?  I do not understand
> why that's impossible.
Because they use <ctype.h>'s toupper() and tolower() functions, which
only work on single-byte characters.
There has been some discussion of using <wctype.h> where available, but
this has a number of issues, notably figuring out the correct mapping
from the server string encoding (eg UTF-8) to unpacked wide characters.
At minimum we'd need to know which charset the locale setting is
expecting, and there doesn't seem to be a portable way to find that out.
IIRC, Peter thinks we must abandon use of libc's locale functionality
altogether and write our own locale layer before we can really have all
the locale-specific functionality we want.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | vjanand | 2003-11-05 15:08:31 | BTree index | 
| Previous Message | Andreas Pflug | 2003-11-05 14:35:21 | Re: Open Sourcing pgManage |