Re: BUG #18771: ICU custom collations with rules ignore collator strength option.

From: Ruben Ruiz <ruben(dot)ruizcuadrado(at)gmail(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18771: ICU custom collations with rules ignore collator strength option.
Date: 2025-01-13 18:47:12
Message-ID: CABKKXvhxC0xUeL=2ETXh5yR2gUmBSSJTBmKtXWUMMC2tgOj0dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I think in this case it's not really related, as I'm not trying to copy
options from the base locale.

It all seems to come from some missing information on the official icu4c
docs. When describing the parameters of ucol_openRules() it says:

"strength: The default collation strength; one of UCOL_PRIMARY,
UCOL_SECONDARY, UCOL_TERTIARY, UCOL_IDENTICAL,UCOL_DEFAULT_STRENGTH - can
be also set in the rules"

And one could easily assume that if it "can also be set in the rules", you
could pass UCOL_DEFAULT_STRENGTH and the rules would take precedence. In no
place it does mention that UCOL_DEFAULT is a valid value for that
parameter, although it is mentioned for the normalizationMode. But, if you
look at icu4c sources (
https://github.com/unicode-org/icu/blob/f8aa68b0c1c9584633e7a61157185f1a2c275f58/icu4c/source/i18n/collationbuilder.cpp#L182)
you can find this:

RuleBasedCollator::internalBuildTailoring(const UnicodeString &rules,
int32_t strength,
UColAttributeValue
decompositionMode,
UParseError *outParseError,
UnicodeString *outReason,
UErrorCode &errorCode) {

...
// Set attributes after building the collator,
// to keep the default settings consistent with the rule string.
if(strength != UCOL_DEFAULT) {
setAttribute(UCOL_STRENGTH,
static_cast<UColAttributeValue>(strength), errorCode);
}
...
}

Which not only implies that UCOL_DEFAULT is a valid argument, but also that
if you don't pass UCOL_DEFAULT any 'strength' options will be overridden.
So it seems that the 'make_icu_collator' function inside postgres should
use UCOL_DEFAULT, to allow the rules to set the desired strength level,
instead of the current UCOL_DEFAULT_STRENGTH argument.

On Mon, 13 Jan 2025 at 17:42, Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:

> On 11.01.25 18:27, PG Bug reporting form wrote:
> > When using the 'rules' option of CREATE COLLATION to create a custom icu
> > collation it seems that, if you include inside the rules a change to the
> > comparison strength, it is ignored.
>
> I think this is the same as this ICU bug:
>
> https://unicode-org.atlassian.net/browse/ICU-22456
>
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message lxiaogang5 2025-01-14 04:24:43 There is a defect in the ReplicationSlotCreate() function where it iterates through ReplicationSlotCtl->replication_slots[max_replication_slots] to find a slot but does not break out of the loop when a slot is found.
Previous Message Peter Eisentraut 2025-01-13 16:42:18 Re: BUG #18771: ICU custom collations with rules ignore collator strength option.