Re: pgsql: Allow tailoring of ICU locales with custom rules

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: pgsql: Allow tailoring of ICU locales with custom rules
Date: 2023-08-14 08:34:42
Message-ID: afdffbd9-98ee-ea03-9640-301463317bc1@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 24.07.23 04:46, Amit Kapila wrote:
> On Fri, Mar 10, 2023 at 3:24 PM Peter Eisentraut
> <peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
>>
>> On 08.03.23 21:57, Jeff Davis wrote:
>>
>>> * It appears rules IS NULL behaves differently from rules=''. Is that
>>> desired? For instance:
>>> create collation c1(provider=icu,
>>> locale='und-u-ka-shifted-ks-level1',
>>> deterministic=false);
>>> create collation c2(provider=icu,
>>> locale='und-u-ka-shifted-ks-level1',
>>> rules='',
>>> deterministic=false);
>>> select 'a b' collate c1 = 'ab' collate c1; -- true
>>> select 'a b' collate c2 = 'ab' collate c2; -- false
>>
>> I'm puzzled by this. The general behavior is, extract the rules of the
>> original locale, append the custom rules, use that. If the custom rules
>> are the empty string, that should match using the original rules
>> untouched. Needs further investigation.
>>
>>> * Can you document the interaction between locale keywords
>>> ("@colStrength=primary") and a rule like '[strength 2]'?
>>
>> I'll look into that.
>
> This thread is listed on PostgreSQL 16 Open Items list. This is a
> gentle reminder to see if there is a plan to move forward with respect
> to open points.

I have investigated this. My assessment is that how PostgreSQL
interfaces with ICU is correct. Whether what ICU does is correct might
be debatable. I have filed a bug with ICU about this:
https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no
response yet.

You can work around this by including the desired attributes in the
rules string, for example

create collation c3 (provider=icu,
locale='und-u-ka-shifted-ks-level1',
rules='[alternate shifted][strength 1]',
deterministic=false);

So I don't think there is anything we need to do here for PostgreSQL 16.

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Bruce Momjian 2023-08-14 17:45:54 pgsql: pgtest: update shell script to use more modern syntax
Previous Message Michael Paquier 2023-08-14 05:49:02 pgsql: Change custom wait events to use dynamic shared hash tables

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiro Ikeda 2023-08-14 08:55:42 Re: Support to define custom wait events for extensions
Previous Message Masahiro Ikeda 2023-08-14 08:12:47 Re: Fix pg_stat_reset_single_table_counters function