Quick Links

Re: retry: converting ASCII to UTF-8

From:	"Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To:	"Tom Hart EXTERN" <tomhart(at)coopfed(dot)org>, "Postgres General List" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: retry: converting ASCII to UTF-8
Date:	2008-01-24 10:24:54
Message-ID:	D960CB61B694CF459DCFB4B0128514C2D2CE1C@exadv11.host.magwien.gv.at
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Tom Hart wrote:
> Originally I set up the database to use ASCII encoding. However we
> have a large base of Spanish speaking members and services,
> and we need utf-8 encoding to maintain and support the extended character
> sets. In my naivety I thought it would be a relatively simple process
> to convert the db but I've found this to not be the case. I tried doing
> a dump and restore into a new database with the proper encoding, but
> pg_restore is getting hung up on one of the tables, our largest by far
> (~1gb, not huge I know). When I tried pg_restore from a command line
> (I was using pgAdmin, I know i'm a nub) I received this error.
>
> C:\Program Files\PostgreSQL\8.2\bin>pg_restore.exe -i -h 192.168.1.xxx -p 5432 -U foobar -d warehouse_utf8 -a -t "transaction" -v "O:\foo\bar\pg_dump_transaction.backup"
> pg_restore: connecting to database for restore
> Password:
> pg_restore: restoring data for table "transaction"
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 1706; 0 17861 TABLE DATA transaction foobar
> pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding "UTF8": 0xc52f
> HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
> CONTEXT: COPY transaction, line 209487
> WARNING: errors ignored on restore: 1

You will have to figure out in which encoding the data really are.

SQL_ASCII will allow anything in, and the client is responsible for
feeding the right thing.

Somebody stored a 0xC5 in the database, which is not ASCII.
In WIN-1252, ISO8859-1, and ISO8859-15 that would be Å.
Could that be?

Once you find out the correct encoding, you can uye the -E switch of
pg_dump to set that encoding for your exported data.

If your clients entered consistent data, that should work.

If different clients used different encodings, you might end up
sorting it out manually...

Yours,
Laurenz Albe

In response to

retry: converting ASCII to UTF-8 at 2008-01-23 20:32:40 from Tom Hart

Browse pgsql-general by date

	From	Date	Subject
Next Message	User Map	2008-01-24 11:20:12	change the default data directory
Previous Message	hubert depesz lubaczewski	2008-01-24 10:21:10	maximum amount of data to be written during checkpoint?