ICU_LOCALE set database default icu collation but not working as intended.

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: ICU_LOCALE set database default icu collation but not working as intended.
Date: 2022-05-26 15:24:05
Message-ID: CACJufxHH5HQbnLHcQzNwGZYyKT9Ld1CWf3C6kWEAEuBsY8wq1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Base on this thread:
https://www.postgresql.org/message-id/20220305083830.lpz3k3yku5lmm5xs%40jrouhaud
ordering reference:
https://unicode-org.github.io/cldr-staging/charts/latest/collation/en_US_POSIX.html

<https://www.postgresql.org/message-id/20220305083830.lpz3k3yku5lmm5xs%40jrouhaud>CREATE
DATABASE dbicu1 LOCALE_PROVIDER icu LOCALE 'en_US.UTF-8' ICU_LOCALE
'en-u-kf-upper' TEMPLATE 'template0';
CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'en_US.UTF-8' ICU_LOCALE
'en-u-kr-latn-digit' TEMPLATE 'template0';
--same script apply to dbicu1 dbicu2
BEGIN;
CREATE COLLATION upperfirst (
provider = icu,
locale = 'en-u-kf-upper'
);
CREATE TABLE icu (
def text,
en text COLLATE "en_US",
upfirst text COLLATE upperfirst,
test_kr text
);
INSERT INTO icu
VALUES ('a', 'a', 'a', '1 a'), ('b', 'b', 'b', 'A 11'), ('A', 'A', 'A',
'A 19'), ('B', 'B', 'B', '8 p');
INSERT INTO icu
VALUES ('a', 'a', 'a', 'a 7');
INSERT INTO icu
VALUES ('a', 'a', 'a', 'Œ 1');
COMMIT;
-----------------------
--dbicu1
SELECT def AS def FROM icu ORDER BY def; --since only character. all works
fine.
SELECT test_kr FROM icu ORDER BY def;
/*
test_kr
---------
A 19
1 a
a 7
Œ 1
8 p
A 11
*/

--dbicu2
SELECT def AS def FROM icu ORDER BY def; --since only character. all works
fine.
SELECT test_kr FROM icu ORDER BY def;
/*
test_kr
---------
1 a
a 7
Œ 1
A 19
A 11
8 p
(6 rows)
*/

Since dbicu1 and dbicu2 set the default collation then
In dbicu1, I should expect the ordering:
number >> Upper case alphabet letter >> lower case alphabet letter>>
character Œ (U+0153)

In dbicu2, I should expect the ordering:
lower case alphabet letter >> Upper case alphabet letter >> number >>
character Œ (U+0153)

As you can see one letter works fine for dbicu1, dbicu2. However, it does
not work on more characters.
Or The result is correct, but something I misunderstood?

I am not sure this is my personal misunderstanding.
In the above examples, the first character of column *test_kr*
is so different that the comparison is based on the first letter.
If the first letter is the same then compute the second letter..
So for whatever collation, I should expect 'A 19' to be adjacent with 'A
11'?

--
I recommend David Deutsch's <<The Beginning of Infinity>>

Jian

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-05-26 15:43:11 Re: Remove support for Visual Studio 2013
Previous Message Peter Eisentraut 2022-05-26 14:36:47 Re: pg_upgrade test writes to source directory