Quick Links

Re: A question about postgresql 8.1 and UTF strings

From:	Oliver Jowett <oliver(at)opencloud(dot)com>
To:	Yair Zas <yair(dot)zaslavsky(at)gmail(dot)com>
Cc:	pgsql-jdbc(at)postgresql(dot)org
Subject:	Re: A question about postgresql 8.1 and UTF strings
Date:	2006-06-18 08:25:53
Message-ID:	44950E11.90508@opencloud.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-jdbc

Yair Zas wrote:

> System.out.println(user.getBytes().length) - however, instead of seeing
> 8 bytes (2 bytes per each character, 4 characters ), i saw 4 bytes ....
> Can you please tell me what is it that I'm doing wrong?

getBytes() uses the JVM's default encoding to translate the String to
bytes.. This is usually something like ISO-8859-1, which is a
one-byte-per-character encoding that can't represent Hebrew letters.

If you want to generate a representation in a particular encoding (e.g.
your description implies you're expecting a particular
2-byte-per-character encoding) then you should use the getBytes()
variant that takes an encoding name.

This is not something specific to JDBC, it's standard Java. If you are
working with characters beyond 7-bit US-ASCII, I'd strongly recommend
doing some research into Java's internal string representation and how
that is transformed into bytes .. The javadoc for Charset is one
starting point:
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html

-O

In response to

A question about postgresql 8.1 and UTF strings at 2006-06-18 08:10:46 from Yair Zas

Browse pgsql-jdbc by date

	From	Date	Subject
Next Message	Wesley J Gyure	2006-06-19 13:52:05	Where can I get source for the 7.3 Build
Previous Message	Yair Zas	2006-06-18 08:10:46	A question about postgresql 8.1 and UTF strings