Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1

From: sunpeng <bluevaley(at)gmail(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1
Date: 2014-07-04 09:12:46
Message-ID: CAOYKhLpgtsNmTO=PRfW47VKL4joJJE-oFDGvHau9LeNpCFDsVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you, friend, I use --hex-blob :
mysqldump -v -nt --complete-insert=TRUE --compatible=postgresql
--default-character-set=utf8 --skip-add-locks --compact --no-create-info
--skip-quote-names --hex-blob -uroot -p test videorecresult >dbdata.sql
to dump mysql data.
And replace blob data "0x...." into "E'\\xx....'" to load data into
postgresql.

On Fri, Jul 4, 2014 at 3:27 PM, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
wrote:

> sunpeng wrote:
> >>> load data to postgresql in cmd(encoding is GBK) is WIN8:
> >>>
> >>> psql -h localhost -d test -U postgres < dbdata.sql
> >>>
> >>> I got the error:
> >>> ERROR: invalid byte sequence for encoding "UTF8": 0xff
>
> >> If the encoding is GBK then you will get errors (or incorrect
> >> characters) if it is read as UTF8. Try setting the environment
> >> variable PGCLIENTENCODING.
> >>
> >> http://www.postgresql.org/docs/9.1/static/app-psql.html
>
> > I‘v changed cmd (in win8) to encoding utf8 through chcp 65001, but error
> still occurs.
> > And i use the following cmd to dump mysql data:
> > mysql> select Picture from personpicture where id =
> 'F2931306D1EE44ca82394CD3BC2404D4' into outfile
> > "d:\\1.txt" ;
> > I got the ansi file, and use Ultraedit to see first 16 bytes:
> > FF D8 FF E0 5C 30 10 4A 46 49 46 5C 30 01 01 5C
> >
> > It's different from mysql workbench to see:
> > FF D8 FF E0 00 10 4a 46 49 46 00 01 01 00 00 01
>
> Changing the terminal code page won't do anything, it's probably the data
> that are in a different encoding.
>
> I don't know enough about MySQL to know which encoding it uses when
> dumping data,
> but the man page of "mysqldump" tells me:
>
> --set-charset
> Add SET NAMES default_character_set to the output. This option is
> enabled by default.
>
> So is there a SET NAMES command in the dump? If yes, what is the argument?
>
> You will have to tell PostgreSQL the encoding of the data.
> As Kevin pointed out, you can do that by setting the environment variable
> PGCLIENT ENCODING to the correct value. Then PostgreSQL will convert the
> data automatically.
>
> Yours,
> Laurenz Albe
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2014-07-04 09:21:01 Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1
Previous Message Albe Laurenz 2014-07-04 07:27:06 Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1