Quick Links

UTF-8, upper() and Chinese characters yielding blank result

From:	Scott Eade <seade(at)backstagetech(dot)com(dot)au>
To:	pgsql-general(at)postgresql(dot)org
Subject:	UTF-8, upper() and Chinese characters yielding blank result
Date:	2006-07-27 14:34:15
Message-ID:	44C8CEE7.1050102@backstagetech.com.au
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

While I could see various multibyte issues in the archives and in the
TODO list, I couldn't spot this exact issue:

I am working with a database that uses UNICODE encoding.

I have a varchar column (col_x) that includes a mix of Chinese and
regular ASCII characters.

On PostgreSQL 7.4.13 (on RHEL4) "select col_x, upper(col_x) from
my_table" performs the desired upper() conversion - i.e. the ASCII
characters are converted to upper case and the Chinese characters are
left as is.

The problem appears on PostgreSQL 8.0.7 (on WinXP) where the upper()
result is apparently blank (this is via pgAdmin III). Worde still, via
JDBC I am getting:
java.sql.SQLException: Invalid character data was found. This is
most likely caused by stored data containing characters that are invalid
for the character set the database was created in. The most common
example of this is storing 8bit data in a SQL_ASCII database.

Is this a bug or a change of behaviour between versions?

Is there some way I can get the 7.4.13 behaviour in 8.0.7?

TIA,

Scott

Responses

Re: UTF-8, upper() and Chinese characters yielding blank result at 2006-07-27 17:22:17 from Peter Eisentraut

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tomasz Ostrowski	2006-07-27 14:35:19	Re: Generating unique session ids
Previous Message	Rodrigo Gonzalez	2006-07-27 14:11:23	Re: Generating unique session ids