Re: [ADMIN] what's the efficient/safest way to convert database character set ?

From: "Huang, Suya" <Suya(dot)Huang(at)au(dot)experian(dot)com>
To: John R Pierce <pierce(at)hogranch(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: [ADMIN] what's the efficient/safest way to convert database character set ?
Date: 2013-10-18 04:49:21
Message-ID: D83E55F5F4D99B4A9B4C4E259E6227CD9DF35C@AUX1EXC01.apac.experian.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Yes John, we probably will use a new database server here to accommodate those converted database.

By saying export/import, do you mean by :
1. pg_dump (//should I specify -E UTF 8 to dump the data in UTF-8 encoding?)
2. create database xxx -E UTF8
3. pg_restore

I also see someone's doing this by the following way:
1. perform a plain text dump of database.
pg_dump -f db.sql [dbname]
2. convert the character encodings.
iconv db.sql -f ISO-8859-1 -t UTF-8 -o db.utf8.sql
3. create the UTF8 database
createdb utf8db (// I'm not sure why he's not specifying DB encoding here, maybe better use -E to specify the encoding as UTF8)
4.restore the converted UTF8 database.
psql -d utf8db -f db.utf8.sql

which method is better? For what I can tell now is the second approach would generate bigger dump file size, so better to pipe it to bzip to have a compressed file. But other than that, any other considerations?

Thanks,
Suya
-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of John R Pierce
Sent: Friday, October 18, 2013 11:23 AM
To: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

On 10/17/2013 3:13 PM, Huang, Suya wrote:
> I've got a question of converting database from ascii to UTF-8, what's
> the best approach to do so if the database size is very large?
> Detailed procedure or experience sharing are much appreciated!

I believe you will need to dump the whole database, and import it into a
new database that uses UTF8 encoding. Ss far as I know, there's no way
to convert encoding in place. As the other gentlemen pointed out, you
also will have to convert/sanitize all text data, as your current
SQL_ASCII fields could easily contain stuff that's not valid UTF8.

for large databases, this is a major undertaking. I find its often
easiest to do a major change like this between the old and a new
database server.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2013-10-18 05:11:48 Re: [ADMIN] what's the efficient/safest way to convert database character set ?
Previous Message Huang, Suya 2013-10-18 04:39:52 Re: [ADMIN] what's the efficient/safest way to convert database character set ?