Re: UNICODE encoding and jdbc related issues

From: "Igor Postelnik" <Igor_Postelnik(at)messageone(dot)com>
To: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: UNICODE encoding and jdbc related issues
Date: 2005-04-06 22:15:28
Message-ID: 46F30BC04EC6364695BC07D4A57AAD2C7C7EAB@auscorpex-1.austin.messageone.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

> > 2. I'm really not sure I want to change the encoding of our main
> database to
> > Unicode. Is there a performance loss when going to a UNICODE
database
> > encoding? What about sorts, etc. I'm really worried about
unintended
> side
> > effects of moving from SQL_ASCII to UNICODE.
>
> You don't need to use unicode, but you must select another encoding.
If
> you'd like to stick with a single byte encoding perhaps LATIN1 would
be
> appropriate for you.

I've asked this before on the performance list but didn't get any reply.
Is there substantial performance difference between using SQL_ASCII,
LATIN1, or UNICODE?

> The driver does SET client_encoding which does work for all real
server
> encodings. The problem is that SQL_ASCII is not a real encoding. It
> accepts any encoding and cannot do conversions to other encodings.
Your
> db right now could easily have a mix of encodings.

ISTM that when you create a database with SQL_ASCII encoding you decide
to deal with character set issues in the applications. Why is the JDBC
driver dictating how the application handles character set issues?

-Igor

> -----Original Message-----
> From: pgsql-jdbc-owner(at)postgresql(dot)org [mailto:pgsql-jdbc-
> owner(at)postgresql(dot)org] On Behalf Of Kris Jurka
> Sent: Wednesday, April 06, 2005 1:23 PM
> To: Chris Kratz
> Cc: pgsql-jdbc(at)postgresql(dot)org
> Subject: Re: [JDBC] UNICODE encoding and jdbc related issues
>
>
>
> On Wed, 6 Apr 2005, Chris Kratz wrote:
>
> > Our production database was created with the default SQL_ASCII
encoding.
> It
> > appears that some of our users have entered characters into the
system
> with
> > characters above 127 (accented vowels, etc). None of the tools we
use
> > currently have had a problem with this behavior until recently,
> everything
> > just worked.
> >
> > I was testing some reporting tools this past weekend and have been
> playing
> > with Jasper reports[1] . Jasper reports is a Java based reporting
tool
> that
> > reads data from the database via JDBC. When I initially tried to
> generate
> > reports, the jdbc connection would crash with the following message:
> >
> > org.postgresql.util.PSQLException: Invalid character data was found.
> >
> > Googling eventually turned up a message on the pgsql-jdbc list
detailing
> the
> > problem[2]. Basically, java cannot convert these characters above
127
> into
> > unicode which is required by java.
> >
> > After some more googling, I found that if I took a recent database
dump
> and
> > then ran it through iconv[3] and then created the database with a
> unicode
> > encoding, everything worked.
> >
> > 1. Is there any way to do a iconv type translation inline in a sql
> statement?
> > ie select translate(text_field, unicode) from table.... Btw, set
> > client_encoding=UNICODE does not work in this situation. In fact
the
> JDBC
> > driver for postgres seems to do this automatically.
>
> You can't do translation inline, how would a driver interpret the
results
> of SELECT translate(field1, unicode), translate(field2, latin1) ?
>
>
>
>
> > 3. Is there any other way around this issue? Or are we living
> dangerously by
> > trying to store non-ascii data in a database created as ascii
encoded?
>
> You are living dangerously.
>
> > 4. Has anyone else gone through a conversion like this? Are there
any
> > gotchas we should look out for?
>
> The gotchas here are to make sure your other client tools still work
> against the new database.
>
> > [3] iconv -f iso8859-1 -t utf-8 < dbsnapshot.dumpall > dump-utf-
> 8.dumpall
>
> I see your data really is LATIN1. Perhaps you should use that as your
db
> encoding. That should keep your existing client tools happy as well
as
> the JDBC driver.
>
> Kris Jurka
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
majordomo(at)postgresql(dot)org

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Kris Jurka 2005-04-06 22:45:58 Re: UNICODE encoding and jdbc related issues
Previous Message Kris Jurka 2005-04-06 18:23:12 Re: UNICODE encoding and jdbc related issues