From: | "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
---|---|
To: | "Tom Hart *EXTERN*" <tomhart(at)coopfed(dot)org>, "Postgres General List" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: retry: converting ASCII to UTF-8 |
Date: | 2008-01-24 10:24:54 |
Message-ID: | D960CB61B694CF459DCFB4B0128514C2D2CE1C@exadv11.host.magwien.gv.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Tom Hart wrote:
> Originally I set up the database to use ASCII encoding. However we
> have a large base of Spanish speaking members and services,
> and we need utf-8 encoding to maintain and support the extended character
> sets. In my naivety I thought it would be a relatively simple process
> to convert the db but I've found this to not be the case. I tried doing
> a dump and restore into a new database with the proper encoding, but
> pg_restore is getting hung up on one of the tables, our largest by far
> (~1gb, not huge I know). When I tried pg_restore from a command line
> (I was using pgAdmin, I know i'm a nub) I received this error.
>
> C:\Program Files\PostgreSQL\8.2\bin>pg_restore.exe -i -h 192.168.1.xxx -p 5432 -U foobar -d warehouse_utf8 -a -t "transaction" -v "O:\foo\bar\pg_dump_transaction.backup"
> pg_restore: connecting to database for restore
> Password:
> pg_restore: restoring data for table "transaction"
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 1706; 0 17861 TABLE DATA transaction foobar
> pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding "UTF8": 0xc52f
> HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
> CONTEXT: COPY transaction, line 209487
> WARNING: errors ignored on restore: 1
You will have to figure out in which encoding the data really are.
SQL_ASCII will allow anything in, and the client is responsible for
feeding the right thing.
Somebody stored a 0xC5 in the database, which is not ASCII.
In WIN-1252, ISO8859-1, and ISO8859-15 that would be Å.
Could that be?
Once you find out the correct encoding, you can uye the -E switch of
pg_dump to set that encoding for your exported data.
If your clients entered consistent data, that should work.
If different clients used different encodings, you might end up
sorting it out manually...
Yours,
Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | User Map | 2008-01-24 11:20:12 | change the default data directory |
Previous Message | hubert depesz lubaczewski | 2008-01-24 10:21:10 | maximum amount of data to be written during checkpoint? |