From: | "Enke, Michael" <michael(dot)enke(at)wincor-nixdorf(dot)com> |
---|---|
To: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
Cc: | pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn'twork |
Date: | 2003-04-14 12:19:18 |
Message-ID: | 3E9AA746.2E07B899@wincor-nixdorf.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
I tried also BIG5 encoded data (Trad. Chinese for Taiwan) and got warnings:
WARNING: copy: line 4586, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored
...
Is this also solved with this fix?
Michael
Tatsuo Ishii wrote:
>
> It turned out that it's a bug with encoding conversion engine of
> PostgreSQL. It just failed to find proper entry from a encoding
> conversion table because of a integer overflow problem. Since only
> maps for EUC_TW have such a huge code point values (for example
> 0x8eaee7aa), I believe the conversion failure merely occurs with the
> particular encoding. Included patches should solve the problem (it is
> against PostgreSQL 7.3.2).
>
> BTW, I'm surprised to find the bug since it has been there since 7.2
> days.
>
> I'm going to commit the fix to both current and 7.3-stable trees.
> --
> Tatsuo Ishii
>
> > Short Description
> > Server-Encoding from EUC_TW to UTF-8 doesn't work
> >
> > Long Description
> > System: SuSE Linux 8.1, kernel 2.4.19, glibc 2.2.5/glibc-locale 2.2.5
> > the same error on RedHat 7.3, kernel 2.4.20, glibc2.2.5
> > postgresql version 7.3.2
> > description: I loaded Chinese (TW) characters, encoded as UTF-8 into a
> > database which has UTF-8 encoding with "copy table from 'original'" with psql. Ok.
> > Than I exit from psql, exported PGCLIENTENCODING=EUC_TW
> > I started psql, make a "copy table to 'file.EUC_TW'". Ok.
> > If I convert this file to UTF-8 with iconv -f EUC-TW -t UTF-8 file.EUC_TW file.UTF-8
> > than file.UTF-8 looks ecaxtly the same as the original.
> > That means, PostgreSQL converts from UTF-8 to EUC_TW correct.
> > Now I load the exported file 'file.EUC_TW' back into DB:
> > "copy table2 from 'file.EUC_TW'", still I did not finish psql,
> > PGCLIENTENCODING is the same as for "copy to".
> > Now I get error telling me: "copy: line 1, LocalToUtf: could not convert (0xe5b5) EUC_TW to UTF-8" ... and the characters are missing in table2
> >
> > Sample Code
> > UTF-8:
> > 00000000: e795 b6e6 97a5 0ae5 959f e58b 95e4 b8ad
> > 00000010: 2ce4 bd86 e69c 89e9 8caf e8aa a40a
> >
> > EUC_TW as exported from PostgreSQL and not imported:
> > 00000000: e5b5 c5ca 0ada f6d9 afc4 e32c c8fe c8b4
> > 00000010: f2e3 eba8 0a
>
> *** src/backend/utils/mb/conv.c.orig 2003-04-12 10:03:25.000000000 +0900
> --- src/backend/utils/mb/conv.c 2003-04-12 10:16:04.000000000 +0900
> ***************
> *** 313,319 ****
>
> v1 = *(unsigned int *) p1;
> v2 = ((pg_utf_to_local *) p2)->utf;
> ! return (v1 - v2);
> }
>
> /*
> --- 313,319 ----
>
> v1 = *(unsigned int *) p1;
> v2 = ((pg_utf_to_local *) p2)->utf;
> ! return (v1 > v2)?1:((v1 == v2)?0:-1);
> }
>
> /*
> ***************
> *** 328,334 ****
>
> v1 = *(unsigned int *) p1;
> v2 = ((pg_local_to_utf *) p2)->code;
> ! return (v1 - v2);
> }
>
> /*
> --- 328,334 ----
>
> v1 = *(unsigned int *) p1;
> v2 = ((pg_local_to_utf *) p2)->code;
> ! return (v1 > v2)?1:((v1 == v2)?0:-1);
> }
>
> /*
From | Date | Subject | |
---|---|---|---|
Next Message | Ennio-Sr | 2003-04-14 15:26:59 | Re: Psql 'Expanded display (\x)' behaviour |
Previous Message | Peter Eisentraut | 2003-04-13 23:40:37 | Re: Psql 'Expanded display (\x)' behaviour |
From | Date | Subject | |
---|---|---|---|
Next Message | Bob Kline | 2003-04-14 12:22:53 | Re: Upgrade to Red Hat Linux 9 broke PostgreSQL |
Previous Message | Justin Clift | 2003-04-14 11:21:37 | Anyone in Brisbane, Australia, and decent with Linux & PostgreSQL? |