| From: | Hunter Hillegas <lists(at)lastonepicked(dot)com> | 
|---|---|
| To: | Kris Jurka <books(at)ejurka(dot)com> | 
| Cc: | PostgreSQL <pgsql-general(at)postgresql(dot)org>, Postgre JDBC List <pgsql-jdbc(at)postgresql(dot)org> | 
| Subject: | Re: [JDBC] Invalid Character Data Problem | 
| Date: | 2004-11-30 06:49:23 | 
| Message-ID: | BDD157F3.4F01E%lists@lastonepicked.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general pgsql-jdbc | 
I got this going with a dump/reload.
Beware however, 8.0b5 and the JDBC driver don't seem to work with UNICODE
encoding for the database. You have to use b4 or HEAD.
Hunter
> From: Kris Jurka <books(at)ejurka(dot)com>
> Date: Tue, 30 Nov 2004 01:34:46 -0500 (EST)
> To: Hunter Hillegas <lists(at)lastonepicked(dot)com>
> Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>, Postgre JDBC List
> <pgsql-jdbc(at)postgresql(dot)org>
> Subject: Re: [JDBC] Invalid Character Data Problem
> 
> 
> 
> On Fri, 26 Nov 2004, Hunter Hillegas wrote:
> 
>> When I SELECT from a certain table, I see this JDBC exception:
>> 
>> "Invalid character data was found.  This is most likely caused by stored
>> data containing characters that are invalid for the character set the
>> database was created in.  The most common example of this is storing 8bit
>> data in a SQL_ASCII database."
>> 
>> The database is indeed of type SQL_ASCII. The table stores mailing list data
>> and has about 400,000 rows.
>> 
>> Looking at the data via psql, I see that some of the rows have strange
>> characters in them, such as question marks where I would not expect them,
>> etc...
>> 
>> What are my options? Is there a way to identify the 'bad' records, or the
>> ones causing trouble?
>> 
> 
> To really solve this problem you need to have a correctly encoded
> database.  This will involve a dump and restore process and possibly
> recoding your data.  This is straightforward if you know what
> encoding your data is, although it will cause some downtime.
> 
> To detect the bad data you can try various SELECTs with the JDBC driver
> and see what errors out.  The function below will determine if a
> particular field has data with the high bit set which is something the
> database really doesn't know what to do with.
> 
> SELECT pkcolumn, hashighbit(columna), hashighbit(columnb) FROM mytable;
> 
> Kris Jurka
> 
> CREATE OR REPLACE FUNCTION hashighbit(text) RETURNS boolean AS '
> DECLARE
>         i int;
> BEGIN
>         i := LENGTH($1);
>         WHILE i > 0 LOOP
>                 IF ascii(substring($1, i, 1)) >= 128 THEN
>                         RETURN true;
>                 END IF;
>                 i := i-1;
>         END LOOP;
>         RETURN false;
> END;
> ' LANGUAGE 'plpgsql';
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
>                http://archives.postgresql.org
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Johan Wehtje | 2004-11-30 06:54:24 | Column n.nsptablespace does not exist error | 
| Previous Message | Kris Jurka | 2004-11-30 06:34:46 | Re: [JDBC] Invalid Character Data Problem | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kris Jurka | 2004-11-30 08:11:10 | Re: Bug in JDBC-Driver? | 
| Previous Message | Kris Jurka | 2004-11-30 06:34:46 | Re: [JDBC] Invalid Character Data Problem |