Quick Links

Re: Vague idea for allowing per-column locale

From:	Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To:	tim(at)proximity(dot)com(dot)au
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vague idea for allowing per-column locale
Date:	2001-08-14 05:01:30
Message-ID:	20010814140130S.t-ishii@sra.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> > Storing everything as Unicode is not a good idea, actually. First,
> > Unicode tends to consume more storage space than other character
> > sets. For example, UTF-8, one of the most commonly used encoding for
> > Unicode consumes 3 bytes for Japanese characters, while SJIS only
> > consumes 2 bytes. Second, a round trip converison between Unicode and
> > other character sets is not always possible. Third, sorting
> > issue. There is no convenient way to sort Unicode correctly.
>
> UTF-16 can handle most Japanese characters in two bytes, afaict. Generally
> it seems that utf8 encodes European text more efficiently on average,
> whereas utf16 is better for most Asian languages.

Same thing can be said to UCS-2. Most multibyte characters could be
two bytes within UCS-2. The problem with both UTF-16 and UCS-4 is that
data may contain NULL bytes.

> I may be mistaken, but I
> was under the impression that sorting of unicode characters was a solved
> problem. The IBM ICU class library (which does have a C interface), for
> example, claims to provide everything you need to sort unicode text in
> various locales, and uses utf16 internally:

Interesting. Thanks for the info. I will look into this.

BTW, "round trip conversion problem" still need to be addressed.
--
Tatsuo Ishii

In response to

Re: Vague idea for allowing per-column locale at 2001-08-14 02:36:19 from Tim Allen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	mlw	2001-08-14 11:39:57	Re: OID unsigned long long
Previous Message	Tom Lane	2001-08-14 02:37:04	Re: OID unsigned long long