Re: Patch for collation using ICU

From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: John Hansen <john(at)geeknet(dot)com(dot)au>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch for collation using ICU
Date: 2005-05-07 13:03:22
Message-ID: B621E7545547424AE78C05E4@palle.girgensohn.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


--On lördag, maj 07, 2005 22.53.46 +1000 John Hansen <john(at)geeknet(dot)com(dot)au>
wrote:

> Errm,... initdb --encoding UNICODE --locale C

You mean that ICU *shall* be used even for the C locale, and not as Bruce
suggested here:

>> I do have a few questions:
>>
>> Why don't you use the lc_ctype_is_c() part of this test?
>>
>> if (pg_database_encoding_max_length() > 1 && !lc_ctype_is_c())
>
> Um, well, I didn't think about that. :) What would be the locale in this
> case? c_C.UTF-8? ;) Hmm, it is possible to have CTYPE=C and use a wide
> encoding, indeed. Then the strings will be handled like byte-wide chars.
> Yeah, it's a bug. I'll fix it! Thanks.

John disagrees here, and I'm obliged to agree. Using the C locale, one will
expect C collation, but upper/lower is better off still using ICU. Hence,
the above stuff is *not* a bug. Do we agree?

/Palle

>
>> -----Original Message-----
>> From: pgsql-hackers-owner(at)postgresql(dot)org
>> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of John Hansen
>> Sent: Saturday, May 07, 2005 10:23 PM
>> To: Palle Girgensohn; Bruce Momjian
>> Cc: pgsql-hackers(at)postgresql(dot)org
>> Subject: Re: [HACKERS] Patch for collation using ICU
>>
>> >
>> > I use this patch in production on one FreeBSD 4.10 server at the
>> > moment.
>> > With the latest version, I've had no problems. Logging is
>> swithed on
>> > for now, and it shows no signs of ICU complaining. I'd like more
>> > reports on Linux, though.
>>
>> I currently use this on gentoo with ICU3.2 unmasked.
>>
>> Works a dream, even with locale C and UNICODE database.
>>
>> Small test:
>>
>> createdb --encoding UNICODE --locale C test psql test set
>> client_encoding=iso88591; CREATE TABLE test (t text); INSERT
>> INTO test (t) VALUES ('æøå'); set client_encoding=unicode;
>> INSERT INTO test (t) SELECT upper(t) FROM test; set
>> client_encoding=iso88591; SELECT * FROM test;
>> t
>> -----
>> æøå
>> ÆØÅ
>> (2 rows)
>>
>> Just as I'd expect, as upper/lower/initcap are locale
>> independent for these characters.
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 5: Have you checked our extensive FAQ?
>>
>> http://www.postgresql.org/docs/faq
>>
>>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Hansen 2005-05-07 13:07:07 Re: Patch for collation using ICU
Previous Message John Hansen 2005-05-07 12:53:46 Re: Patch for collation using ICU