From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | johnsw(at)wardbrook(dot)com |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Unicode + LC_COLLATE |
Date: | 2004-04-22 13:39:05 |
Message-ID: | 200404221539.05444.peter_e@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Am Donnerstag, 22. April 2004 13:17 schrieb John Sidney-Woollett:
> Does anyone know what the effect of --lc-collate=C --encoding=UNICODE will
> be for sorts (and indexes?) when a multibyte unicode character is
> encountered?
You get your strings sorted in binary order of the UTF-8 encoding, which is
probably not very interesting, but it's possible.
> Is it also true that if LC_COLLATE != 'C' that indexes cannot be used for
> LIKE comparisons (and is this also true for en_US.iso885915)?
No, see <http://www.postgresql.org/docs/7.4/static/indexes-opclass.html>.
> Our database is UNICODE with LC_COLLATE=en_US.iso885915. Does anyone know
> what the effect of someone storing a cyrillic/chinese or korean character
> is?
This setup will result in UTF-8 characters being sorted by the system thinking
they are actually ISO-8859-15 characters. So the result will be random at
best.
> (We are using JDBC with a webapp so all the unicode concerns are
> handled transparently, apparantly). When the data is extracted from the DB
> will it render correctly in the browser provided we send all responses
> encoded in UTF-8?
If your database is in UNICODE and you're using JDBC then you should be all
set as far as PostgreSQL is concerned. Of course, your HTML pages need to
declare the encoding correctly as well.
From | Date | Subject | |
---|---|---|---|
Next Message | Eric Comeau | 2004-04-22 13:42:12 | Re: Replication |
Previous Message | Oleg Bartunov | 2004-04-22 13:33:55 | Re: [GENERAL] Restoring a Databases that features tserach2 |