From: | "John Hansen" <john(at)geeknet(dot)com(dot)au> |
---|---|
To: | "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp> |
Cc: | <pgman(at)candle(dot)pha(dot)pa(dot)us>, <girgen(at)pingpong(dot)net>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Patch for collation using ICU |
Date: | 2005-05-08 04:07:29 |
Message-ID: | 5066E5A966339E42AA04BA10BA706AE50A930B@rodrick.geeknet.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tatsuo Ishii wrote:
> Sent: Sunday, May 08, 2005 10:09 AM
> To: John Hansen
> Cc: pgman(at)candle(dot)pha(dot)pa(dot)us; girgen(at)pingpong(dot)net;
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Patch for collation using ICU
>
> > Bruce Momjian wrote:
> > >
> > > There are two reasons for that optimization --- first,
> some locale
> > > support is broken and Unicode encoding with a C locale
> crashes (not
> > > an issue for ICU), and second, it is an optimization for
> languages
> > > like Japanese that want to use unicode, but don't need a locale
> > > because upper/lower means nothing in those character sets.
> >
> > No, upper/lower means nothing in those languages, so why would you
> > need to optimize upper/lower if they're not used??
> > And if they are, it's obviously because the text contains
> characters
> > from other languages (probably english) and as such they
> should behave
> > correctly.
>
> Yes, Japanese (and probably Chinese and Korean) languages
> include ASCII character. More precisely ASCII is part of Japanese
> encodings(LATIN1 is not, however). And we have no problem at
> all with glibc/C locale. See below("unitest" is an UNICODE database).
>
> unitest=# create table t1(t text);
> CREATE TABLE
> unitest=# \encoding EUC_JP
> unitest=# insert into t1 values('abcあいう');
> INSERT 1842628 1
> unitest=# select upper(t) from t1;
> upper
> -----------
> ABCあいう
> (1 row)
>
> So Japanese(including ASCII)/UNICODE behavior is perfectly
> correct at this moment.
Right, so you _never_ use accented ascii characters in Japanese?
(like è for example, whose uppercase is È)
> So I strongly object removing that optimization.
I'm guessing this would call for a vote then, since if implementing ICU, then
I'd have to object to leaving it in.
Changing the bahaviour of ICU doesn't seem right. Changing the behaviour of pg,
so that it works as it should when using unicode, seems the right solution to me.
> --
> Tatsuo Ishii
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | John Hansen | 2005-05-08 04:15:16 | Re: [HACKERS] Invalid unicode in COPY problem |
Previous Message | Madison Kelly | 2005-05-08 04:02:37 | Re: Invalid unicode in COPY problem |