From: | "John Hansen" <john(at)geeknet(dot)com(dot)au> |
---|---|
To: | "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp> |
Cc: | <pgman(at)candle(dot)pha(dot)pa(dot)us>, <girgen(at)pingpong(dot)net>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Patch for collation using ICU |
Date: | 2005-05-09 00:11:45 |
Message-ID: | 5066E5A966339E42AA04BA10BA706AE50A9317@rodrick.geeknet.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> -----Original Message-----
> From: Tatsuo Ishii [mailto:t-ishii(at)sra(dot)co(dot)jp]
> Sent: Sunday, May 08, 2005 11:08 PM
> To: John Hansen
> Cc: pgman(at)candle(dot)pha(dot)pa(dot)us; girgen(at)pingpong(dot)net;
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Patch for collation using ICU
>
> > > I don't buy it. If current conversion tables does the
> right thing,
> > > why we need to replace. Or if conversion tables are not
> correct, why
> > > don't you fix it? I think the rule of character
> conversion will not
> > > change frequently, especially for LATIN languages. Thus
> maintaining
> > > cost is not too high.
> >
> > I never said we need to, but if we're going to implement
> ICU, then we
> > might as well go all the way.
>
> So you admit there's no benefit using ICU for replacing
> existing conversions?
>
> Besides ICU does not support all existing conversions, I
> think ICU has serious flaw for using conversion. If I
> understand correctly, ICU uses UNICODE internally to do the
> conversion. For example, to implement
> SJIS->EUC_JP conversion, ICU first converts SJIS to UNICODE then
> converts UNICODE to EUC_JP. Problem is these conversion is
> not roud trip(conversion between SJIS/EUC_JP and UNICODE will
> lose some information). Thus SJIS->EUC_JP->SJIS conversion
> using ICU does not preserve original text.
Just for the record, I fetched a web page encoded in sjis, and converted
it to euc-jp and back using uconv from ICU 3.2, and the result is the
original is identical to the transformed file.
uconv -f Shift_JIS -t EUC-JP -o index.html.euc index.html
uconv -f EUC-JP -t Shift_JIS -o index.html.sjis index.html.euc
diff index.html index.html.sjis
... John
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2005-05-09 00:23:16 | Re: [HACKERS] read-only database |
Previous Message | John Hansen | 2005-05-09 00:03:30 | Re: Patch for collation using ICU |