| From: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
|---|---|
| To: | harry(at)mantheakis(dot)freeserve(dot)co(dot)uk |
| Cc: | pgsql-general(at)postgresql(dot)org |
| Subject: | Re: Japanese words not distinguished |
| Date: | 2005-07-13 01:07:38 |
| Message-ID: | 20050713.100738.71084359.t-ishii@sra.co.jp |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
> Hello
>
> I run PostgreSQL 7.4.6 on Linux with a JDBC client.
>
> I initialised my database cluster with the following initdb command:
>
> initdb --locale=en_GB.UTF-8 --encoding UNICODE
>
> I have now discovered that my database cannot distinguish Japanese names or
> words - it throws unique constraint errors on a composite primary key that
> includes a VARCHAR field which stores the names or words.
>
> My tests indicate that the database treats all Japanese names/words as
> equal.
That's a famous problem among Japaneses PostgreSQL users since the
locale support was born.
> Having searched the forum archives, it seems to me that I should have
> specified "--locale=C" as the locale setting when I initialised my database
> cluster.
>
> I am planning to re-initialise my database cluster using the following
> command:
>
> initdb --locale=C --encoding UNICODE
>
> Then, after defining the relevant groups and users, I intend to call
> pg_restore with reference to a "tar.gz" dump file of my data.
>
> I wonder if someone might be kind enough to confirm that this is the right
> approach to solving the problem.
Correct. The lesson is, never use locale support for Asian languages
and multibyte encodings including UTF-8.
--
Tatsuo Ishii
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Fuhr | 2005-07-13 02:08:47 | Re: dynamically loaded functions |
| Previous Message | Tatsuo Ishii | 2005-07-13 01:07:28 | Re: utf-8 and cultural sensitive sorting |