From: | Barry Lind <blind(at)xythos(dot)com> |
---|---|
To: | Joseph Shraibman <jks(at)selectacast(dot)net> |
Cc: | pgsql-jdbc(at)postgresql(dot)org |
Subject: | Re: ArrayIndexOutOfBoundsException in Encoding.decodeUTF8() |
Date: | 2003-01-08 01:34:48 |
Message-ID: | 3E1B8038.6050702@xythos.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-jdbc |
Joseph,
The problem is that your database claims to be ASCII, but you are
storing non-ascii data in it.
As of 7.3 the jdbc driver has the server convert from the database
character set to UTF8. Then send the data to the driver in UTF8 and the
driver then decodes the UTF8 to java unicode.
The conversion from ASCII to UTF8 is a noop since the 127 characters of
ascii map directly to the same values in UTF8. However since you are
storing not ASCII data the values that have the values from 128 - 255
just get passed from the server to the client without any additional
processing (since there aren't supposed to be any values in this range),
but then when the driver tries to convert to java unicode, it can't
because it has received an invalid UTF8 character.
It seems that you are actually storing Latin1 data in this database and
thus the database character set should probably be Latin1.
In 7.2 is was possible to override the character set used by the driver,
however I don't think this works anymore when connecting to a 7.3
server. .... looks at code .... Yes the override is ignored if the
server is a 7.3 server. You could hack at AbstractJdbc1Connection to
work around the issue or just correctly set the database character set
to match the data that the database contains.
thanks,
--Barry
Joseph Shraibman wrote:
> BTW the string that caused this is 'Oné'
>
> Joseph Shraibman wrote:
>
>> java.lang.ArrayIndexOutOfBoundsException: 3
>> at org.postgresql.core.Encoding.decodeUTF8(Encoding.java:253)
>> at org.postgresql.core.Encoding.decode(Encoding.java:165)
>> at org.postgresql.core.Encoding.decode(Encoding.java:181)
>> at
>> org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
>>
>>
>> The relavent code is:
>>
>> while (i < k) {
>> z = data[i] & 0xFF;
>> if (z < 0x80) {
>> l_cdata[j++] = (char)data[i];
>> i++;
>> } else if (z >= 0xE0) { // length == 3
>> y = data[i+1] & 0xFF; //<<== THIS IS LINE 253
>> x = data[i+2] & 0xFF;
>> val = (z-0xE0)*pow2_12 + (y-0x80)*pow2_6 + (x-0x80);
>> l_cdata[j++] = (char) val;
>> i+= 3;
>> } else { // length == 2 (maybe add checking for
>> length > 3, throw exception if it is
>>
>>
>> And in the method that calls that:
>>
>> if (encoding.equals("UTF-8")) {
>> return decodeUTF8(encodedString, offset, length);
>> }
>>
>> The thing is my database encoding is SQL_ASCII
>>
>> => SELECT version(), getdatabaseencoding() ;
>>
>> version | getdatabaseencoding
>> ---------------------------------------------------------------------------------------------------------+---------------------
>>
>> PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2
>> 20020903 (Red Hat Linux 8.0 3.2-7) | SQL_ASCII
>> (1 row)
>>
>> ... so why is it trying to decode the string as UTF-8? I just
>> upgraded this database from 7.2.3 yesterday.
>>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Roman Fail | 2003-01-08 02:31:50 | Re: Binary data migration from MSSQL |
Previous Message | carl garland | 2003-01-08 01:07:39 | Re: [HACKERS] Have people taken a look at pgdiff yet? |
From | Date | Subject | |
---|---|---|---|
Next Message | Barry Lind | 2003-01-08 01:52:43 | Re: JDBC driver reports column size of -1 !? |
Previous Message | Joseph Shraibman | 2003-01-07 23:32:14 | Re: ArrayIndexOutOfBoundsException in Encoding.decodeUTF8() |