From: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
---|---|
To: | robinson(at)netrinsics(dot)com |
Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)hub(dot)org |
Subject: | Re: [HACKERS] fatal copy in/out error (6.5.3) |
Date: | 2000-01-25 02:06:35 |
Message-ID: | 20000125110635M.t-ishii@sra.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> My config line:
> ./configure --with-mb=EUC_CN
>
> I can forward my include/config.h if that would be helpful.
>
> The OS is FreeBSD 3.4-RELEASE.
>
> >This looks to me like something is deciding that \217 must be the
> >start of a 3-byte multibyte character... in which case, it should have
> >appeared that way in your database, I think.
I suspect that too.
> That would be very weird, if true. \217 is certainly not the beginning of
> a UTF-8 three-byte sequence, and EUC doesn't have three-byte codes.
No. some EUC's (EUC_TW and EUC_JP) has three-byte or even four-byte
codes. But you said your database has been configured as EUC_CN. As
far as I know, it only uses 1 or 2 byte-code. Another thing I am
confused is that ' \217\210' is not a valid EUC_CN data at all. \217
(0x8f) specifies code set 3 which does not exist in EUC_CN. In this
case, it is assumed that the multi-byte word to be consisted of 3-byte
code in the current implementation of PostgreSQL.
In short, the problem you have is caused by:
1) wrong data submitted into the table
2) PostgreSQL assumes the data is consisted of 3 bytes data
I would recommend you delete the data since it's not correct anyway.
In the mean time I'm going to fix 2) so that it assumes data be
consisted of 2 bytes even if wrong data sequence is submitted
(needless to say, except ascii).
Do you want the backpatch for 6.5.3?
--
Tatsuo Ishii
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2000-01-25 02:12:32 | Re: [HACKERS] Happy column dropping |
Previous Message | Hiroshi Inoue | 2000-01-25 02:05:35 | RE: [HACKERS] Well, then you keep your darn columns |