Re: [JDBC] Invalid Character Data Problem

From: Hunter Hillegas <lists(at)lastonepicked(dot)com>
To: Kris Jurka <books(at)ejurka(dot)com>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>, Postgre JDBC List <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: [JDBC] Invalid Character Data Problem
Date: 2004-11-30 06:49:23
Message-ID: BDD157F3.4F01E%lists@lastonepicked.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-jdbc

I got this going with a dump/reload.

Beware however, 8.0b5 and the JDBC driver don't seem to work with UNICODE
encoding for the database. You have to use b4 or HEAD.

Hunter

> From: Kris Jurka <books(at)ejurka(dot)com>
> Date: Tue, 30 Nov 2004 01:34:46 -0500 (EST)
> To: Hunter Hillegas <lists(at)lastonepicked(dot)com>
> Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>, Postgre JDBC List
> <pgsql-jdbc(at)postgresql(dot)org>
> Subject: Re: [JDBC] Invalid Character Data Problem
>
>
>
> On Fri, 26 Nov 2004, Hunter Hillegas wrote:
>
>> When I SELECT from a certain table, I see this JDBC exception:
>>
>> "Invalid character data was found. This is most likely caused by stored
>> data containing characters that are invalid for the character set the
>> database was created in. The most common example of this is storing 8bit
>> data in a SQL_ASCII database."
>>
>> The database is indeed of type SQL_ASCII. The table stores mailing list data
>> and has about 400,000 rows.
>>
>> Looking at the data via psql, I see that some of the rows have strange
>> characters in them, such as question marks where I would not expect them,
>> etc...
>>
>> What are my options? Is there a way to identify the 'bad' records, or the
>> ones causing trouble?
>>
>
> To really solve this problem you need to have a correctly encoded
> database. This will involve a dump and restore process and possibly
> recoding your data. This is straightforward if you know what
> encoding your data is, although it will cause some downtime.
>
> To detect the bad data you can try various SELECTs with the JDBC driver
> and see what errors out. The function below will determine if a
> particular field has data with the high bit set which is something the
> database really doesn't know what to do with.
>
> SELECT pkcolumn, hashighbit(columna), hashighbit(columnb) FROM mytable;
>
> Kris Jurka
>
> CREATE OR REPLACE FUNCTION hashighbit(text) RETURNS boolean AS '
> DECLARE
> i int;
> BEGIN
> i := LENGTH($1);
> WHILE i > 0 LOOP
> IF ascii(substring($1, i, 1)) >= 128 THEN
> RETURN true;
> END IF;
> i := i-1;
> END LOOP;
> RETURN false;
> END;
> ' LANGUAGE 'plpgsql';
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Johan Wehtje 2004-11-30 06:54:24 Column n.nsptablespace does not exist error
Previous Message Kris Jurka 2004-11-30 06:34:46 Re: [JDBC] Invalid Character Data Problem

Browse pgsql-jdbc by date

  From Date Subject
Next Message Kris Jurka 2004-11-30 08:11:10 Re: Bug in JDBC-Driver?
Previous Message Kris Jurka 2004-11-30 06:34:46 Re: [JDBC] Invalid Character Data Problem