From: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
---|---|
To: | john(at)geeknet(dot)com(dot)au |
Cc: | pgman(at)candle(dot)pha(dot)pa(dot)us, girgen(at)pingpong(dot)net, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Patch for collation using ICU |
Date: | 2005-05-08 00:08:45 |
Message-ID: | 20050508.090845.39153917.t-ishii@sra.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Bruce Momjian wrote:
> >
> > There are two reasons for that optimization --- first, some
> > locale support is broken and Unicode encoding with a C locale
> > crashes (not an issue for ICU), and second, it is an
> > optimization for languages like Japanese that want to use
> > unicode, but don't need a locale because upper/lower means
> > nothing in those character sets.
>
> No, upper/lower means nothing in those languages, so why would you need
> to optimize upper/lower if they're not used??
> And if they are, it's obviously because the text contains characters
> from other languages (probably english) and as such they should behave
> correctly.
Yes, Japanese (and probably Chinese and Korean) languages include
ASCII character. More precisely ASCII is part of Japanese
encodings(LATIN1 is not, however). And we have no problem at all with
glibc/C locale. See below("unitest" is an UNICODE database).
unitest=# create table t1(t text);
CREATE TABLE
unitest=# \encoding EUC_JP
unitest=# insert into t1 values('abcあいう');
INSERT 1842628 1
unitest=# select upper(t) from t1;
upper
-----------
ABCあいう
(1 row)
So Japanese(including ASCII)/UNICODE behavior is perfectly correct at
this moment. So I strongly object removing that optimization.
--
Tatsuo Ishii
From | Date | Subject | |
---|---|---|---|
Next Message | Madison Kelly | 2005-05-08 00:22:53 | Invalid unicode in COPY problem |
Previous Message | Tatsuo Ishii | 2005-05-08 00:08:39 | Re: Patch for collation using ICU |