Quick Links

Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From:	Johann Zuschlag <zuschlag2(at)online(dot)de>
To:	Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, Dave Page <dpage(at)vale-housing(dot)co(dot)uk>, pgsql-odbc(at)postgresql(dot)org
Subject:	Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text
Date:	2006-03-31 16:51:07
Message-ID:	442D5DFB.4080501@online.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-odbc

Hiroshi Inoue schrieb:
>
> Unicode ODBC drivers handle UCS-2 not UTF-8 even in European
> environemt. Unfortunately PostgreSQL doesn't handle UCS-2
> directly(because it could contain NULL bytes in the string), the
> unicode driver sets the client_encoding to UTF-8 automatically and
> converts from UCS-2 data to UTF-8 data which the PostgreSQL backend
> can understands when sending queries. So what you
> can see in the backend log is UTF-8. Then the backend converts from
> UTF-8 data to the server encoding data. After all, the locale
> (especially LC_COLLATE) setting you need is the one which matches the
> backend encoding.
>
Hmm..., so Windows XP uses UCS-2 or do be more correct (like Bart
mentioned) UTF-16 (which is nearly the same, except for the surrogates).
That is converted to UTF-8, sent to the backend and then converted to
the proper locale and stored. I've read about the problems with the NULL
bytes on Unix machines.

Let's have two examples:
1.
backend-1 = ISO8859-1
backend-2 = UTF-8

'A' = U+0041 (does windows use big-endian?)

Win UCS-2: U+0041
ODBC UTF-8: U+41
backend-1 stores = 0x41
backend-2 stores = U+41

2.
'Ä' = U+00C4 (german A-Umlaut)

Win UCS-2: U+00C4
ODBC UTF-8: U+C384
backend-1 stores = 0xC4
backend-2 stores = U+C384

Did I get that right? So I have to be really careful when testing.

Regards,
Johann

In response to

Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields at 2006-03-30 21:35:12 from Hiroshi Inoue

Responses

Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text at 2006-03-31 16:58:57 from Johann Zuschlag
Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text at 2006-03-31 18:47:05 from Marc Herbert
Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text at 2006-03-31 19:02:38 from Marc Herbert
Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text at 2006-03-31 19:12:13 from Marc Herbert

Browse pgsql-odbc by date

	From	Date	Subject
Next Message	Johann Zuschlag	2006-03-31 16:58:57	Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text
Previous Message	Thomas Chabaud	2006-03-31 14:00:29	Strange Update query ...