From: | Sim Zacks <sim(at)compulab(dot)co(dot)il> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | encoding confusion |
Date: | 2008-06-10 13:10:28 |
Message-ID: | g2lug4$oo3$1@news.hub.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
{BACKGROUND]
I am testing dbmail for our corporate email solution.
We originally tested it on mysql and now we are migrating it to postgresql.
The messages are stored in a longblob field on mysql and a bytea field
in postgresql.
I set the database up as UTF-8, even though we get emails that are not
UTF encoded, mostly because I didn't know what else to try that would
incorporate all the possible encodings. Examples of 3 encodings we
regularly receive are: UTF-8, Windows-1255, ISO-8859-8-I.
I transferred the entire database except for one table using the
dbi-link contrib module, connecting directly to MySQL and pulling the
data. The table that did not transfer was the Messageblks table which
has a field mentioned above.
It would not transfer through the dbi-link, so I wrote a python script
(see below) to read a row from mysql and write a row to postgresql
(using pygresql and mysqldb).
When I used pygresql's escape_bytea function to copy the data, it went
smoothly, but the data was corrupt.
When I tried the escape_string function it died because the data it was
moving was not UTF-8.
I finally got it to work by defining a database as SQL-ASCII and then
using escape_string worked. After the data was all in place, I pg_dumped
and pg_restored into a UTF-8 database and it surprisingly works now.
[CONFUSION]
What I don't understand, is that if that database can't handle the non
UTF characters, how does it allow them in when I receive an email
(tested, it works) and when I restored the backup.
I also don't understand why the data transfer didn't work to a UTF
database, but it worked to an ASCII database, if the data can go into a
UTF database from an ascii database.
Lastly, I wanted to know if anybody has experience moving data from
mysql to postgresql and if what I did is considered normal, or if there
is a better way of doing this.
Thank you
Sim
[Python script]:
import MySQLdb
from MySQLdb.cursors import *
import pg
import sys
pgdb=pg.connect(host="1.2.3.4",user="username",dbname="dbmail",
passwd="password")
mydb=MySQLdb.connect(host="localhost",user="mysqluser",passwd="mysqlpassword",
db="dbmail", cursorclass=MySQLdb.cursors.SSDictCursor,charset="utf8")
mycrs=mydb.cursor()
mycrs.execute("select * from dbmail_messageblks")
while 1:
nextrow=mycrs.fetchone()
nextrow["messageblk"]=pg.escape_string(nextrow["messageblk"])
pgdb.query("insert into
dbmail_messageblks(messageblk,physmessage_id,is_header,messageblk_idnr,blocksize)
values($field$%(messageblk)s$field$,%(physmessage_id)s,%(is_header)s,%(messageblk_idnr)s,%(blocksize)s)"
% nextrow)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkhOfUQACgkQjDX6szCBa+oyJwCghkG7XpcrHKY7ybeJgvjPA/XM
qLIAoMS4gOWpP4o2lKrRU2v0IdtaiSQj
=OhYp
-----END PGP SIGNATURE-----
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Fuhr | 2008-06-10 13:38:49 | Re: REGEXP_REPLACE woes |
Previous Message | Leif B. Kristensen | 2008-06-10 12:59:53 | Re: REGEXP_REPLACE woes |