From: | "Enke, Michael" <michael(dot)enke(at)wincor-nixdorf(dot)com> |
---|---|
To: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
Cc: | pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Bug #659: lower()/upper() bug on ->multibyte<- DB |
Date: | 2002-05-13 09:57:21 |
Message-ID: | 3CDF8E01.DC0B2817@wincor-nixdorf.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Tatsuo Ishii wrote:
>
> [Cc:ed to hackers]
>
> (trying select convert(lower(convert('X', 'LATIN1')),'LATIN1','UNICODE');)
>
> > Ok, this is working now (I cann't reproduce why not at the first time).
>
> Good.
>
> > Is it planned to implement it so that I can write lower()/ upper() for multibyte
> > according to SQL standard (without convert)?
>
> SQL standard? The SQL standard says nothing about locale. So making
> lower() (and others) "locale aware" is far different from the SQL
> standard of point of view. Of course this does not mean "locale
> support" is should not be a part of PostgreSQL's implementation of
> SQL. However, we should be aware the limitation of "locale support"
> (as well as multibyte support). They are just the stopgap util CREATE
> CHARACTER SET etc. is implemnted IMO.
>
> > I could do it if you tell me where the final tolower()/toupper() happens.
> > (but not before middle of June).
>
> For the short term solution making convert() hiding from users might
> be a good idea (what I mean here is kind of auto execution of
> convert()). The hardest part is there's no idea how we could find a
> relationship bewteen particular locale and the encoding. For example,
> you know that for de_DE locale using LATIN1 encoding is appropreate,
> but PostgreSQL does not.
I think it is really not hard to do this for UTF-8. I don't have to know the
relation between the locale and the encoding. Look at this:
We can use the LC_CTYPE from pg_controldata or alternatively the LC_CTYPE
at server startup. For nearly every locale (de_DE, ja_JP, ...) there exists
also a locale *.utf8 (de_DE.utf8, ja_JP.utf8, ...) at least for the actual Linux glibc.
We don't need to know more than this. If we call
setlocale(LC_CTYPE, <value of LC_CTYPE extended with .utf8 if not already given>)
then glibc is aware of doing all the conversions. I attach a small demo program
which set the locale ja_JP.utf8 and is able to translate german umlaut A (upper) to
german umlaut a (lower).
What I don't know (have to ask a glibc delveloper) is:
Why there exists dozens of locales *.utf8 and what is the difference
between all /usr/lib/locale/*.utf8/LC_CTYPE?
But for all existing locales *.utf8, the conversion of german umlauts is working properly.
Regards,
Michael
PS: I'm not in my office for the next 3 weeks and therefore not able to read my mails.
Attachment | Content-Type | Size |
---|---|---|
mb.c | text/plain | 1.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | pgsql-bugs | 2002-05-13 11:13:57 | Bug #667: Lib needed when install rpm |
Previous Message | Tom Lane | 2002-05-13 03:56:44 | Re: Bug #666: vacuum dies when called from plpgsql after large delete |
From | Date | Subject | |
---|---|---|---|
Next Message | D'Arcy J.M. Cain | 2002-05-13 10:21:19 | Re: Further info : Very high load average but no cpu utilization ? |
Previous Message | Christopher Kings-Lynne | 2002-05-13 05:14:46 | Re: TRUNCATE |