From: | "Iain" <iain(at)mst(dot)co(dot)jp> |
---|---|
To: | <pgsql-admin(at)postgresql(dot)org> |
Subject: | How to fix bad multibyte data? |
Date: | 2005-01-12 08:31:49 |
Message-ID: | 001101c4f881$2c1af3d0$7201a8c0@mst1x5r347kymb |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hi All,
I have a v7.1 database whose encoding is EUC_JP and I'm trying to get it
into a v7.4 database whose encoding is also EUC_JP. Unfortunately it seems
that 7.4 is much stricter about it's multibyte data then 7.1 was because
attempts to restore into the 7.4 db result in "Invalid byte sequence for
encoding"EUC_JP": 0x8e' errors.
There is no doubt that the data in the 7.1 database is bad, though I'm not
sure exactly how it got that way (the data was loaded by a C program from
CSV file). Anyway, I can dump/restore on 7.1 ok. and I can restore into a
7.4 DB with the encoding set to SQL_ASCII but that isn't really what we
want.
I'm thinking that I may have to put the dump file through some kind of
filter that will at least ensure that the data is valid EUC_JP, even if it
mangles the data a little by dropping the invalid bytes.
The question is, how would one go about this? I think a perl script might do
the job (I'm not familiar with perl at all though), but there might be other
ways... so before I go off down that path, I'm wondering if anyone has any
suggestions.
Regards
Iain
From | Date | Subject | |
---|---|---|---|
Next Message | Kavan, Dan (IMS) | 2005-01-12 16:16:43 | pam |
Previous Message | Bruno Wolff III | 2005-01-12 05:49:23 | Re: Limiting user privileges |