Re: Charset encoding and accents

From: Barry Lind <blind(at)xythos(dot)com>
To: Thomas O'Dowd <tom(at)nooper(dot)com>
Cc: Davide Romanini <romaz(at)libero(dot)it>, pgsql-hackers(at)postgresql(dot)org, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Charset encoding and accents
Date: 2003-04-10 16:32:40
Message-ID: 3E959CA8.2090208@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-jdbc

The charSet= option will no longer work with the 7.3 driver talking to a
7.3 server, since character set translation is now performed by the
server (for performance reasons) in that senario.

The correct solution here is to convert the database to the proper
character set for the data it is storing. SQL_ASCII is not a proper
character set for storing 8bit data.

--Barry

Thomas O'Dowd wrote:
> Davide,
>
> ASCII implies 7-bit characters which is doesn't have enough information
> to store the accented characters that you are using. I'm confused as to
> how they are being stored in the database at all if this is the case. I
> presume it gets stored as the 8th bit is there anyway by default, but
> that shouldn't really be expected me thinks.
>
> Your database should probably be using LATIN1 (ISO-8859-1) or some other
> 8 bit encoding if you really want to store 8 bit information in it.
>
> Anyway, try connecting with:
>
> jdbc:postgresql://localhost/prova?charSet=LATIN1
>
> This might well work for you. That said I haven't tried this nor dug
> into the internals of the java driver in a while. I'll Cc the jdbc list.
>
> Tom.
>
> On Thu, 2003-04-10 at 18:04, Davide Romanini wrote:
>
>>Hi,
>>
>>I've posted this problem two times in the pgsql-jdbc user list, but no
>>one helped me to solve it. I think this is a really serious problem in
>>the jdbc driver. I've tried different solutions with no result.
>>
>>Well, let me explain the problem. I've a currently working database in
>>PostgreSQL. There's an application, written in M$ Access, that uses the
>>database through the ODBC driver with no problems. I'd want to access
>>the data using a Swing application through the jdbc driver.
>>At server side the charset encoding is set as SQL_ASCII. It is not a
>>problem because all the strings containing accented characters are
>>retrived correctly by ODBC and also the psql client.
>>But if I retrive strings containing accents (like àòù) using jdbc I get
>>in trouble because my accents get dirty. For example: the string 'La
>>città di Forlì' is retrived and displayed as 'La citt?di Forl?'!
>>
>>I've worked a bit around the problem with the source code of the driver.
>>I notice that when I call rs.getString(), the driver invokes (at a
>>certain point) the method org.postgresql.core.Encoding.decode(byte[]
>>encodedString, int offset, int length).
>>This method calls the decodeUTF8 when the actual encoding equals to
>>"UTF-8". If the encoding is different, it simply returns a new
>>String(encodedString, offset, length, encoding).
>>Well, my database is SQL_ASCII, so the jdbc driver should return a new
>>string and not call decodeUTF8. But when I do a step by step debug into
>>the source, the encoding ALWAYS equals to UTF-8! I've also tried to set
>>a parameter in my connection string:
>>jdbc:postgresql://localhost/prova?charSet=SQL_ASCII (I've tried a lot of
>>different encodings here). The encoding is always UTF-8.
>>Well, I thought 'if the driver wants strings to be UNICODE, set up the
>>server variable CLIENT_ENCODING to UNICODE'. No result! It doesn't change!
>>The only way to have my string displayed correctly is to comment out all
>>the decodeUTF8 and take it return a new String(data). So I think that if
>>the encoding is correctly recognized to be different from UTF-8 the
>>decode method will return the new String that is the correct behaviour
>>in my case.
>>
>>Please don't answer me to change my database to UNICODE. I cannot do
>>that. And I do not WANT to do that. Why the ODBC driver works fine and
>>the JDBC driver works only with UNICODE databases?? It's a bug and
>>should be corrected. If I was skilled enough I corrected the bug myself
>>but I don't know much about JDBC standard.
>>
>>I hope you answer to me with a solution. Really, the driver is simply
>>unusable for serious work with this bug.
>>
>>The problem is not solved with the latest stable (version 7.3 build 109)
>>and development (version 7.4 build 204) release of the driver.
>>
>>Regards, Romaz
>>--
>>Davide Romanini
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-04-10 16:38:46 Re: More thoughts about FE/BE protocol
Previous Message Steve Crawford 2003-04-10 15:55:01 Re: More thoughts about FE/BE protocol

Browse pgsql-jdbc by date

  From Date Subject
Next Message . 2003-04-10 19:13:14 Re: Problem asking columns allowing NULL values
Previous Message Barry Lind 2003-04-10 16:29:11 Re: RES: Problems retrieving data from bytea field