Quick Links

Re: error while trying to change the database encoding on a database

From:	Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Cc:	Geoffrey Myers <lists(at)serioustechnology(dot)com>
Subject:	Re: error while trying to change the database encoding on a database
Date:	2011-01-24 16:20:16
Message-ID:	201101240820.17047.adrian.klaver@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Monday 24 January 2011 8:06:38 am Geoffrey Myers wrote:
> Adrian Klaver wrote:
> > On Monday 24 January 2011 7:57:52 am Geoffrey Myers wrote:
> >> Adrian Klaver wrote:
> >>> On Monday 24 January 2011 6:38:55 am Geoffrey Myers wrote:
> >>>> We need to change the database encoding on our databases as they were
> >>>> created with the wrong encoding. They were created as SQL_ASCII and
> >>>> we are changing them to UTF8.
> >>>>
> >>>> When testing this Friday, I received the following error:
> >>>>
> >>>> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> >>>> pg_restore: [archiver (db)] Error from TOC entry 5225; 0 16990 TABLE
> >>>> DATA cust postgres
> >>>> pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence
> >>>> for encoding "UTF8": 0xb0
> >>>> HINT: This error can also happen if the byte sequence does not match
> >>>> the encoding expected by the server, which is controlled by
> >>>> "client_encoding".
> >>>> CONTEXT: COPY cust, line 778
> >>>
> >>> ^^^^^^^ In the COPY command for that table.
> >>
> >> I picked up ont that, but the dump is binary, thus I can not view the
> >> actual code.
> >
> > Actually you can :) I should have mentioned it before. You can have
> > pg_restore restore to a file instead of a database by using the -f
> > switch. When you do that it creates plain text output. You could restore
> > the entire dump to the file or use the -t switch to get only the table
> > you need.
>
> Thanks for the suggestion. As it stands, we are getting different
> errors for different hex characters, thus the solution we need is the
> ability to identify the characters that won't convert from SQL_ASCII to
> UTF8. Is there a resource that would identify these characters?
>

Well the issue is that SQL_ASCII is not an encoding. From the docs:
http://www.postgresql.org/docs/9.0/interactive/multibyte.html#MULTIBYTE-CHARSET-SUPPORTED
"Thus, this setting is not so much a declaration that a specific encoding is in
use, as a declaration of ignorance about the encoding. In most cases, if you
are working with any non-ASCII data, it is unwise to use the SQL_ASCII setting
because PostgreSQL will be unable to help you by converting or validating
non-ASCII characters. "

What you need to do is determine what applications where putting data into the
database and what encoding they are using. I ran into this a couple of years
back with an app that was using WIN1252 for data being inserted into a couple
of tables in a SQL_ASCII database . Once I knew the encoding I dumped the table
schema only for those tables into a new UTF8 database. Using psql I set the
client_encoding to WIN1252 and then used \i to pull in a plain text data only
dump for each table.

>
> --
> Until later, Geoffrey
>
> "I predict future happiness for America if they can prevent
> the government from wasting the labors of the people under
> the pretense of taking care of them."
> - Thomas Jefferson

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com

In response to

Re: error while trying to change the database encoding on a database at 2011-01-24 16:06:38 from Geoffrey Myers

Responses

Re: error while trying to change the database encoding on a database at 2011-01-24 17:16:46 from Geoffrey Myers

Browse pgsql-general by date

	From	Date	Subject
Next Message	Geoffrey Myers	2011-01-24 17:16:46	Re: error while trying to change the database encoding on a database
Previous Message	Geoffrey Myers	2011-01-24 16:06:38	Re: error while trying to change the database encoding on a database