Quick Links

Re: Bug #659: lower()/upper() bug on ->multibyte<- DB

From:	"Enke, Michael" <michael(dot)enke(at)wincor-nixdorf(dot)com>
To:	Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc:	pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Bug #659: lower()/upper() bug on ->multibyte<- DB
Date:	2002-05-13 09:57:21
Message-ID:	3CDF8E01.DC0B2817@wincor-nixdorf.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs pgsql-hackers

Tatsuo Ishii wrote:
>
> [Cc:ed to hackers]
>
> (trying select convert(lower(convert('X', 'LATIN1')),'LATIN1','UNICODE');)
>
> > Ok, this is working now (I cann't reproduce why not at the first time).
>
> Good.
>
> > Is it planned to implement it so that I can write lower()/ upper() for multibyte
> > according to SQL standard (without convert)?
>
> SQL standard? The SQL standard says nothing about locale. So making
> lower() (and others) "locale aware" is far different from the SQL
> standard of point of view. Of course this does not mean "locale
> support" is should not be a part of PostgreSQL's implementation of
> SQL. However, we should be aware the limitation of "locale support"
> (as well as multibyte support). They are just the stopgap util CREATE
> CHARACTER SET etc. is implemnted IMO.
>
> > I could do it if you tell me where the final tolower()/toupper() happens.
> > (but not before middle of June).
>
> For the short term solution making convert() hiding from users might
> be a good idea (what I mean here is kind of auto execution of
> convert()). The hardest part is there's no idea how we could find a
> relationship bewteen particular locale and the encoding. For example,
> you know that for de_DE locale using LATIN1 encoding is appropreate,
> but PostgreSQL does not.

I think it is really not hard to do this for UTF-8. I don't have to know the
relation between the locale and the encoding. Look at this:
We can use the LC_CTYPE from pg_controldata or alternatively the LC_CTYPE
at server startup. For nearly every locale (de_DE, ja_JP, ...) there exists
also a locale *.utf8 (de_DE.utf8, ja_JP.utf8, ...) at least for the actual Linux glibc.
We don't need to know more than this. If we call
setlocale(LC_CTYPE, <value of LC_CTYPE extended with .utf8 if not already given>)
then glibc is aware of doing all the conversions. I attach a small demo program
which set the locale ja_JP.utf8 and is able to translate german umlaut A (upper) to
german umlaut a (lower).
What I don't know (have to ask a glibc delveloper) is:
Why there exists dozens of locales *.utf8 and what is the difference
between all /usr/lib/locale/*.utf8/LC_CTYPE?
But for all existing locales *.utf8, the conversion of german umlauts is working properly.

Regards,
Michael

PS: I'm not in my office for the next 3 weeks and therefore not able to read my mails.

Attachment	Content-Type	Size
mb.c	text/plain	1.8 KB

In response to

Re: Bug #659: lower()/upper() bug on ->multibyte<- DB at 2002-05-11 01:36:53 from Tatsuo Ishii

Responses

Re: [HACKERS] Bug #659: lower()/upper() bug on at 2002-05-14 01:29:54 from Tatsuo Ishii

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	pgsql-bugs	2002-05-13 11:13:57	Bug #667: Lib needed when install rpm
Previous Message	Tom Lane	2002-05-13 03:56:44	Re: Bug #666: vacuum dies when called from plpgsql after large delete

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	D'Arcy J.M. Cain	2002-05-13 10:21:19	Re: Further info : Very high load average but no cpu utilization ?
Previous Message	Christopher Kings-Lynne	2002-05-13 05:14:46	Re: TRUNCATE