From: | "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
---|---|
To: | "Sim Zacks *EXTERN*" <sim(at)compulab(dot)co(dot)il>, <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: encoding confusion |
Date: | 2008-06-11 05:58:16 |
Message-ID: | D960CB61B694CF459DCFB4B0128514C20230A1CE@exadv11.host.magwien.gv.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Sim Zacks wrote:
> We originally tested it on mysql and now we are migrating it
> to postgresql.
>
> The messages are stored in a longblob field on mysql and a bytea field
> in postgresql.
>
> I set the database up as UTF-8, even though we get emails that are not
> UTF encoded, mostly because I didn't know what else to try that would
> incorporate all the possible encodings. Examples of 3 encodings we
> regularly receive are: UTF-8, Windows-1255, ISO-8859-8-I.
[...]
> It would not transfer through the dbi-link, so I wrote a python script
> (see below) to read a row from mysql and write a row to postgresql
> (using pygresql and mysqldb).
> When I used pygresql's escape_bytea function to copy the data, it went
> smoothly, but the data was corrupt.
> When I tried the escape_string function it died because the data it was
> moving was not UTF-8.
>
> I finally got it to work by defining a database as SQL-ASCII and then
> using escape_string worked. After the data was all in place, I pg_dumped
> and pg_restored into a UTF-8 database and it surprisingly works now.
It's very dificult to know what exactly happened unless you have some
examples of a byte sequence that illustrates what you describe:
How it looked in MySQL, how it looked in your Python script, what you
fed to escape_bytea.
What client encoding did you use in your Python script?
Yours,
Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2008-06-11 07:03:34 | Re: encoding confusion |
Previous Message | Sim Zacks | 2008-06-11 05:35:59 | Re: encoding confusion |