From: | "Marco Bizzarri" <marco(dot)bizzarri(at)gmail(dot)com> |
---|---|
To: | "Tino Wildenhain" <tino(at)wildenhain(dot)de> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Dumping in LATIN1 and restoring in UTF-8 |
Date: | 2006-07-06 07:51:09 |
Message-ID: | 3f0d61c40607060051y583529cekda79e0eba249ab6d@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 7/6/06, Tino Wildenhain <tino(at)wildenhain(dot)de> wrote:
> Marco Bizzarri schrieb:
> > Hi all.
> >
> > Here is my use case: I've an application which uses PostgreSQL as
> > backend. Up to now, the database was encoded in SQL_ASCII or LATIN1.
> > Now, we need to migrate to UTF-8.
> >
> > What we tried, was to:
> >
> > 1) dump the database using pg_dump, in tar format (we had blob);
> > 2) modifying the result, using some conversion tool (like recode)
> >
> >
> > 3) destroying the old database
> > 4) recreating the database with UNICODE setting
> > 5) restoring the database using pg_restore
> >
> > The result was not what I expected. The pg_restore was using the
> > LATIN1 encoding to encode the strings, resulting in a LATIN1 encoded
> > in UTF-8...
> >
> > The problem lied in the toc.dat file, which stated that the client
> > encoding was LATIN1, instead of UTF-8.
> >
> > The solution in the end has been to manually modifying the toc.dat
> > file, substituting the LATIN1 string with UTF-8 (plus a space, since
> > the toc.dat is a binary file).
> >
> > Even though it worked for us, I wonder if there is any other way to
> > accomplish the same result, at least to specify the encoding for the
> > restore.
>
> Yes, its actually quite esay: you dump as you feel apropriate,
> then create the database with the encoding you want,
> restore w/o creating database and you are done.
> Restore sets the client encoding to what it actually was
> in the dump data (in your case latin-1) and the database
> would be utf-8 - postgres automatically recodes. No need
> for iconv and friends.
>
> Regards
> Tino
>
First of all, thank you for your answer. However, I suspect I did not
understand your answer, since the commands I used were:
1) pg_dump -Ft -b -f dump.sql.tar database
2) dropdb database
3) createdb -E UNICODE database
4) pg_restore -d database dump.sql.tar
According to my experience, this produces a "double encoding". As you
can see, I hand-created the database, with the proper encoding.
However, when I reimported the database, the result was a latin1
encoded in utf-8, rather than a pure utf-8.
How my procedure was different with respect to yours?
I will make some test with a sample database, and enabling the
logging, so that I can understand the commands which are issued.
Regards
Marco
--
Marco Bizzarri
http://notenotturne.blogspot.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Tino Wildenhain | 2006-07-06 07:58:06 | Re: Dumping in LATIN1 and restoring in UTF-8 |
Previous Message | Tino Wildenhain | 2006-07-06 06:49:34 | Re: Dumping in LATIN1 and restoring in UTF-8 |