From: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
---|---|
To: | alexs(at)advfn(dot)com |
Cc: | sknipe(at)tucows(dot)com, pgsql-general(at)postgresql(dot)org |
Subject: | Re: utf-8 and cultural sensitive sorting |
Date: | 2005-07-13 01:07:28 |
Message-ID: | 20050713.100728.41628839.t-ishii@sra.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> It depends what language you want to sort. Lots of languages do not
> have a sort alphabet. For example, Japanese. It can be quite
> difficult to sort unusual languages like this. I am not aware of any
> standard technique for sorting Japanese text other than keeping an
> arbitrarily sorted dictionary (courtesy of whatever the most popular
> Japanese dictionary at the time happens to be perhaps) and then doing
> hash lookups in the for indexing values. As you can imagine, this is
> not particularly fast. I have not actually tried this, but I expect
> PosgreSQL will simply sort in a fairly binary fashion. As in, it gets
> sorted in according to the binary value of the characters, or the
> UTF-8 offsets, or something like that.
Above is almost correct but usually sorting by the JIS code order is
enough for most Japanese applications (I believe same thing can be
said to Chinese). I do not recommend using locale for sorting
Japanese. It quite frequently happens that the locale support for
multibyte encodings is totally broken. See recent posting titled
"[GENERAL] Japanese words not distinguished" for more details.
If you have to live with UTF-8 database, I recommend turning off the
locale support and use CONVERT to sort Japanese. For example,
SELECT * FROM t1 ORDER BY CONVERT(col1 USING utf_8_to_euc_jp);
> On 12 Jul 2005, at 15:48, <sknipe(at)tucows(dot)com> <sknipe(at)tucows(dot)com> wrote:
>
> > Our product will be storing its character data in utf-8 format
> > (unicode encoding).
> >
> > What is the best way to achive cultural sensitive sorting using the
> > utf-8 data?
> >
> > Is it possible have the locale apply to a connection?
> >
> > If so, is the cultural sorting support mature in PostgreSQL?
> >
> > What type of performance can be expected as compared with the
> > normal c locale sorting?
> >
> > Thanks very much,
> >
> > Steve.
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org so that
> > your
> > message can get through to the mailing list cleanly
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2005-07-13 01:07:38 | Re: Japanese words not distinguished |
Previous Message | Michael Fuhr | 2005-07-13 00:57:00 | Re: Temp tables... |