From: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
---|---|
To: | David Warnock <david(at)sundayta(dot)co(dot)uk> |
Cc: | t-ishii(at)sra(dot)co(dot)jp, "pgsql-interfaces(at)postgreSQL(dot)org" <pgsql-interfaces(at)postgreSQL(dot)org> |
Subject: | Re: [INTERFACES] JDBC and character sets |
Date: | 1999-06-22 14:13:46 |
Message-ID: | 199906221413.XAA11243@srapc451.sra.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-interfaces |
>I think I saw from the list of developers that you wrote a lot of the
>multiu-byte code. Is that correct? If so my grateful thanks.
Yes, I'm responsible for the code multi-byte.
>Are there any limitations or gotchas about using unicode everywhere?
>
>Specifically
>
>1. Column length. Is this measured in unicode characters or do I need to
>increase the length of Varchars? ie is a varchar(10) certain to hold 10
>unicode characters?
When you define varchar(n), n should be counted in bytes, not
characters.
We assume Unicode is input as UTF-8 encoding. In UTF-8, 10 ASCII chars
take 10 bytes. So varchar(10) will hold 10 Unicode chars if they are
all ASCII. However, if you use ISO8859 chars they will take 2 bytes
for each letter. If you use KANJI, 3 bytes for each letter. You could
use octet_length() to measure the size of a Unicode string in bytes.
>2. Indexing. What sort order will I get from an index or an order by for
>unicode characters.
It will sorted in the order of Unicode code point.
>Can this be customised.
Currently no.
>Generally I try to do any
>really important sorting in Java where I can use the correct sort order
>for the locale.
>3. Upper/lowercase. I have been using separate columns for uppercase
>versions of names etc again so that the case changes can be done by the
>client which will know the correct rules for the locale where the data
>is entered. What do upper/lower case functions in Postgresql do with
>unicode?
I think it will related to locale. I'm not sure but I've heard about
Unicode locale. If it really exists, you could do:
configure --with-mb=UNICODE --with-locale
so that upper/lower works for Unicode.
>4. Are there any limitations on what I use to write triggers? Can all
>the different ways work reliably with unicode?
I'm not sure but it should work with triggers.
--
Tatsuo Ishii
From | Date | Subject | |
---|---|---|---|
Next Message | David Warnock | 1999-06-22 14:36:06 | Re: [INTERFACES] JDBC and character sets |
Previous Message | David Warnock | 1999-06-22 13:54:11 | Re: [INTERFACES] JDBC and character sets |