From: | Geoffrey Myers <lists(at)serioustechnology(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | UTF8 conversion revisited |
Date: | 2011-03-29 18:34:41 |
Message-ID: | 4D922641.2030102@serioustechnology.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
So, we are still having an issue with this and I thought I'd throw this
out to the list to see if I'm missing something. Basically, we have
identified the tables/fields we need to convert. I'm running the
following perl code against the fields and re-inserting the 'fixed' code
into the field:
data =~ s/(.)/((ord($1) >= 0) && (ord($1) <= 8))
|| (ord($1) == 11)
|| ((ord($1) >= 13) && (ord($1) <= 31))
|| ((ord($1) >= 127)) ?"": $1/egs;
This appears to be working as a large number of records are cleaned.
Problem is, someone it's not fixing data that contains the hex value
0xbd, as when I attempt to dump this database and create a new one with
the UTF8 encoding I get the following error:
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 5246; 0 4978675 TABLE
DATA cust postgres
pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence
for encoding "UTF8": 0xbd
As I see it, the perl code above should catch this '0xbd' character, but
somehow it is finding it's way through.
Any insights would be greatly appreciated.
--
Until later, Geoffrey
"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson
From | Date | Subject | |
---|---|---|---|
Next Message | Thom Brown | 2011-03-29 18:44:51 | Curious case of the unstoppable user |
Previous Message | Hans C. Poo | 2011-03-29 18:19:05 | Perl script to drop duplicated constraints definitions |