From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | ICU_LOCALE set database default icu collation but not working as intended. |
Date: | 2022-05-26 15:24:05 |
Message-ID: | CACJufxHH5HQbnLHcQzNwGZYyKT9Ld1CWf3C6kWEAEuBsY8wq1A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Base on this thread:
https://www.postgresql.org/message-id/20220305083830.lpz3k3yku5lmm5xs%40jrouhaud
ordering reference:
https://unicode-org.github.io/cldr-staging/charts/latest/collation/en_US_POSIX.html
<https://www.postgresql.org/message-id/20220305083830.lpz3k3yku5lmm5xs%40jrouhaud>CREATE
DATABASE dbicu1 LOCALE_PROVIDER icu LOCALE 'en_US.UTF-8' ICU_LOCALE
'en-u-kf-upper' TEMPLATE 'template0';
CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'en_US.UTF-8' ICU_LOCALE
'en-u-kr-latn-digit' TEMPLATE 'template0';
--same script apply to dbicu1 dbicu2
BEGIN;
CREATE COLLATION upperfirst (
provider = icu,
locale = 'en-u-kf-upper'
);
CREATE TABLE icu (
def text,
en text COLLATE "en_US",
upfirst text COLLATE upperfirst,
test_kr text
);
INSERT INTO icu
VALUES ('a', 'a', 'a', '1 a'), ('b', 'b', 'b', 'A 11'), ('A', 'A', 'A',
'A 19'), ('B', 'B', 'B', '8 p');
INSERT INTO icu
VALUES ('a', 'a', 'a', 'a 7');
INSERT INTO icu
VALUES ('a', 'a', 'a', 'Œ 1');
COMMIT;
-----------------------
--dbicu1
SELECT def AS def FROM icu ORDER BY def; --since only character. all works
fine.
SELECT test_kr FROM icu ORDER BY def;
/*
test_kr
---------
A 19
1 a
a 7
Œ 1
8 p
A 11
*/
--dbicu2
SELECT def AS def FROM icu ORDER BY def; --since only character. all works
fine.
SELECT test_kr FROM icu ORDER BY def;
/*
test_kr
---------
1 a
a 7
Œ 1
A 19
A 11
8 p
(6 rows)
*/
Since dbicu1 and dbicu2 set the default collation then
In dbicu1, I should expect the ordering:
number >> Upper case alphabet letter >> lower case alphabet letter>>
character Œ (U+0153)
In dbicu2, I should expect the ordering:
lower case alphabet letter >> Upper case alphabet letter >> number >>
character Œ (U+0153)
As you can see one letter works fine for dbicu1, dbicu2. However, it does
not work on more characters.
Or The result is correct, but something I misunderstood?
I am not sure this is my personal misunderstanding.
In the above examples, the first character of column *test_kr*
is so different that the comparison is based on the first letter.
If the first letter is the same then compute the second letter..
So for whatever collation, I should expect 'A 19' to be adjacent with 'A
11'?
--
I recommend David Deutsch's <<The Beginning of Infinity>>
Jian
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2022-05-26 15:43:11 | Re: Remove support for Visual Studio 2013 |
Previous Message | Peter Eisentraut | 2022-05-26 14:36:47 | Re: pg_upgrade test writes to source directory |