Quick Links

Re: Patch for collation using ICU

From:	"John Hansen" <john(at)geeknet(dot)com(dot)au>
To:	"Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	"Palle Girgensohn" <girgen(at)pingpong(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Patch for collation using ICU
Date:	2005-05-07 14:10:42
Message-ID:	5066E5A966339E42AA04BA10BA706AE50A9305@rodrick.geeknet.com.au
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Bruce Momjian wrote:
>
> There are two reasons for that optimization --- first, some
> locale support is broken and Unicode encoding with a C locale
> crashes (not an issue for ICU), and second, it is an
> optimization for languages like Japanese that want to use
> unicode, but don't need a locale because upper/lower means
> nothing in those character sets.

No, upper/lower means nothing in those languages, so why would you need
to optimize upper/lower if they're not used??
And if they are, it's obviously because the text contains characters
from other languages (probably english) and as such they should behave
correctly.

Did I mention that for japanese and the like, ICU would also offer
transliteration...

>
> So, the first issue doesn't apply for ICU, and the second
> might not depending on what characters you are using in the
> Unicode character set.
>
> I guess I am little confused how ICU can do upper() when the
> locale is C. What is it using to determine A is upper for a?
> Am I confused?

Simple, UNICODE basically consist of a table of characters
(http://www.unicode.org/Public/UNIDATA/UnicodeData.txt)

Excerpt:

0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061;
...
0061;LATIN SMALL LETTER A;Ll;0;L;;;;;N;;;0041;;0041

From this you can see, that for 0041, which is capital letter A, there
is a mapping to it's lowercase counterpart, 0061
Likewise, there is a mapping for 0061 which says it's uppercase
counterpart is 0041.
There is also SpecialCasing.txt which covers those mappings that haven't
got a 1-1 mapping, such as the german SS.

These mappings are fixed, independent of locale, only a few cases from
specialcasing.txt depend on locale/context.

Responses

Re: Patch for collation using ICU at 2005-05-07 14:34:24 from Bruce Momjian
Re: Patch for collation using ICU at 2005-05-08 00:08:45 from Tatsuo Ishii

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	John Hansen	2005-05-07 14:11:21	Re: Patch for collation using ICU
Previous Message	Palle Girgensohn	2005-05-07 14:10:30	Re: Patch for collation using ICU