Re: Patch for collation using ICU

From: "John Hansen" <john(at)geeknet(dot)com(dot)au>
To: "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp>
Cc: <alvherre(at)dcc(dot)uchile(dot)cl>, <pgman(at)candle(dot)pha(dot)pa(dot)us>, <girgen(at)pingpong(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for collation using ICU
Date: 2005-05-08 08:47:25
Message-ID: 5066E5A966339E42AA04BA10BA706AE50A930F@rodrick.geeknet.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tatsuo Ishii
> Sent: Sunday, May 08, 2005 3:41 PM
> To: John Hansen
> Cc: alvherre(at)dcc(dot)uchile(dot)cl; pgman(at)candle(dot)pha(dot)pa(dot)us;
> girgen(at)pingpong(dot)net; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Patch for collation using ICU
>
> > Alvaro Herrera wrote:
> > > Sent: Sunday, May 08, 2005 2:49 PM
> > > To: John Hansen
> > > Cc: Tatsuo Ishii; pgman(at)candle(dot)pha(dot)pa(dot)us; girgen(at)pingpong(dot)net;
> > > pgsql-hackers(at)postgresql(dot)org
> > > Subject: Re: [HACKERS] Patch for collation using ICU
> > >
> > > On Sun, May 08, 2005 at 02:07:29PM +1000, John Hansen wrote:
> > > > Tatsuo Ishii wrote:
> > >
> > > > > So Japanese(including ASCII)/UNICODE behavior is
> > > perfectly correct
> > > > > at this moment.
> > > >
> > > > Right, so you _never_ use accented ascii characters in
> Japanese?
> > > > (like è for example, whose uppercase is È)
> > >
> > > That isn't ASCII. It's latin1 or some other ASCII extension.
> >
> > Point taken...
> > But...
> >
> > If you want EUC_JP (Japanese + ASCII) then use that as your
> backend encoding, not UTF-8 (unicode).
> > UTF-8 encoded databases are very useful for representing multiple
> > languages in the same database, but this usefulness
> vanishes if functions like upper/lower doesn't work correctly.
>
> I'm just curious if Germany/French/Spanish mixed text can be
> sorted correctly. I think these languages need their own
> locales even with UNICODE/ICU.

No, they will not sort correctly, for that you still need the locale.

>
> > So optimizing for 3 languages breaks more than a hundred,
> that's doesn't seem fair!

That is a compromise I'd be willing to agree on. :)

> Why don't you add a GUC variable or some such to control the
> upper/lower behavior?
> --
> Tatsuo Ishii
>
>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Hansen 2005-05-08 08:55:21 Re: [HACKERS] Invalid unicode in COPY problem
Previous Message Tino Wildenhain 2005-05-08 07:18:29 Re: [HACKERS] Invalid unicode in COPY problem