From: | Paul Ramsey <pramsey(at)refractions(dot)net> |
---|---|
To: | PostgreSQL <pgsql-general(at)postgresql(dot)org> |
Subject: | 8.0, UTF8, and CLIENT_ENCODING |
Date: | 2007-05-17 20:56:00 |
Message-ID: | 464CC160.4080401@refractions.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I have a small database (PgSQL 8.0, database encoding UTF8) that folks
are inserting into via a web form. The form itself is declared
ISO-8859-1 and the prior to inserting any data, pg_client_encoding is
set to LATIN1.
Most of the high-bit characters are correctly translated from LATIN1 to
UTF8. So for e-accent-egu I see the two-byte UTF8 value in the database.
Sometimes, in their wisdom, people cut'n'paste information out of MSWord
and put that in the form. Instead of being mapped to 2-byte UTF8
high-bit equivalents, they are going into the database directly as
one-byte values > 127. That is, as illegal UTF8 values.
When I try to dump'n'restore this database into PgSQL 8.2, my data can't
made the transit.
Firstly, is this "kinda sorta" encoding handling expected in 8.0, or did
I do something wrong?
Secondly, anyone know any useful tools to pipe a stream through to strip
out illegal UTF8 bytes, so I can pipe my dump through that rather than
hand editing it?
Thanks,
Paul
--
Paul Ramsey
Refractions Research
http://www.refractions.net
pramsey(at)refractions(dot)net
Phone: 250-383-3022
Cell: 250-885-0632
From | Date | Subject | |
---|---|---|---|
Next Message | Ben | 2007-05-17 21:01:06 | Re: Large Database Restore |
Previous Message | Ron Johnson | 2007-05-17 20:55:43 | Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array) |