From: | Oliver Jowett <oliver(at)opencloud(dot)com> |
---|---|
To: | Ronald Vyhmeister <rvyhmeister(at)aiias(dot)edu> |
Cc: | 'Ronald Vyhmeister' <rvyhmeister(at)gmail(dot)com>, pgsql-jdbc(at)postgresql(dot)org |
Subject: | Re: Problem with accessing Russian UTF database |
Date: | 2008-11-26 00:21:58 |
Message-ID: | 492C96A6.7020800@opencloud.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-jdbc |
Ronald Vyhmeister wrote:
> As for the Unicode escapes, how do I determine them?
The syntax of a unicode escape in Java is \uNNNN where NNNN is the hex
value of the Unicode codepoint you want to use.
See http://unicode.org/charts/ to find the particular ones you need
(e.g. Cyrillic is in http://unicode.org/charts/PDF/U0400.pdf)
>> Also, as I suggested earlier, try examining your strings
>> character-by-character to check that they really contain the codepoints
>> you think they contain.
>
> Right now, the string I'm entering was from the keyboard, set to Russian mode (and yes, I've tried it from Linux and Windows, and the results are the same).
What I mean is to do something like this:
> String someString = /* whatever you want to inspect */;
> char[] rawCharacters = someString.toCharArray();
> for (int i = 0; i < rawCharacters.length; ++i)
> System.out.println("#" + i + " = " + Integer.toHexString((int)rawCharacters[i]));
so that you can see exactly what the String really contains, not
whatever the combination of your output encoding & your terminal
encoding thinks it should look like. (Java strings are UCS-2/UTF-16
internally, which is a 1:1 mapping to Unicode codepoint values most of
the time, so the above code prints out unicode codepoint values in hex)
-O
From | Date | Subject | |
---|---|---|---|
Next Message | Ronald Vyhmeister | 2008-11-26 07:13:18 | Re: Problem with accessing Russian UTF database |
Previous Message | Ronald Vyhmeister | 2008-11-26 00:04:38 | Re: Problem with accessing Russian UTF database |