From: | Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Cc: | Geoffrey Myers <lists(at)serioustechnology(dot)com> |
Subject: | Re: error while trying to change the database encoding on a database |
Date: | 2011-01-24 16:20:16 |
Message-ID: | 201101240820.17047.adrian.klaver@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Monday 24 January 2011 8:06:38 am Geoffrey Myers wrote:
> Adrian Klaver wrote:
> > On Monday 24 January 2011 7:57:52 am Geoffrey Myers wrote:
> >> Adrian Klaver wrote:
> >>> On Monday 24 January 2011 6:38:55 am Geoffrey Myers wrote:
> >>>> We need to change the database encoding on our databases as they were
> >>>> created with the wrong encoding. They were created as SQL_ASCII and
> >>>> we are changing them to UTF8.
> >>>>
> >>>> When testing this Friday, I received the following error:
> >>>>
> >>>> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> >>>> pg_restore: [archiver (db)] Error from TOC entry 5225; 0 16990 TABLE
> >>>> DATA cust postgres
> >>>> pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence
> >>>> for encoding "UTF8": 0xb0
> >>>> HINT: This error can also happen if the byte sequence does not match
> >>>> the encoding expected by the server, which is controlled by
> >>>> "client_encoding".
> >>>> CONTEXT: COPY cust, line 778
> >>>
> >>> ^^^^^^^ In the COPY command for that table.
> >>
> >> I picked up ont that, but the dump is binary, thus I can not view the
> >> actual code.
> >
> > Actually you can :) I should have mentioned it before. You can have
> > pg_restore restore to a file instead of a database by using the -f
> > switch. When you do that it creates plain text output. You could restore
> > the entire dump to the file or use the -t switch to get only the table
> > you need.
>
> Thanks for the suggestion. As it stands, we are getting different
> errors for different hex characters, thus the solution we need is the
> ability to identify the characters that won't convert from SQL_ASCII to
> UTF8. Is there a resource that would identify these characters?
>
Well the issue is that SQL_ASCII is not an encoding. From the docs:
http://www.postgresql.org/docs/9.0/interactive/multibyte.html#MULTIBYTE-CHARSET-SUPPORTED
"Thus, this setting is not so much a declaration that a specific encoding is in
use, as a declaration of ignorance about the encoding. In most cases, if you
are working with any non-ASCII data, it is unwise to use the SQL_ASCII setting
because PostgreSQL will be unable to help you by converting or validating
non-ASCII characters. "
What you need to do is determine what applications where putting data into the
database and what encoding they are using. I ran into this a couple of years
back with an app that was using WIN1252 for data being inserted into a couple
of tables in a SQL_ASCII database . Once I knew the encoding I dumped the table
schema only for those tables into a new UTF8 database. Using psql I set the
client_encoding to WIN1252 and then used \i to pull in a plain text data only
dump for each table.
>
> --
> Until later, Geoffrey
>
> "I predict future happiness for America if they can prevent
> the government from wasting the labors of the people under
> the pretense of taking care of them."
> - Thomas Jefferson
--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Geoffrey Myers | 2011-01-24 17:16:46 | Re: error while trying to change the database encoding on a database |
Previous Message | Geoffrey Myers | 2011-01-24 16:06:38 | Re: error while trying to change the database encoding on a database |