Here is an updated patch.  It is rebased over the various recent changes 
in the locale APIs.  No other changes.
On 30.07.24 21:46, Peter Eisentraut wrote:
> On 27.07.24 00:32, Paul A Jungwirth wrote:
>> On Thu, Jun 27, 2024 at 11:31 PM Peter Eisentraut 
>> <peter(at)eisentraut(dot)org> wrote:
>>> Here is an updated patch for this.
>>
>> I took a look at this. I added some tests and found a few that give
>> the wrong result (I believe). The new tests are included in the
>> attached patch, along with the results I expect. Here are the
>> failures:
> 
> Thanks, these are great test cases.
> 
>>
>>   -- inner %% matches b then zero:
>>   SELECT U&'cb\0061\0308' LIKE U&'c%%\00E4' COLLATE ignore_accents;
>>    ?column?
>>   ----------
>> - t
>> + f
>>   (1 row)
>>
>>   -- trailing _ matches two codepoints that form one char:
>>   SELECT U&'cb\0061\0308' LIKE U&'cb_' COLLATE ignore_accents;
>>    ?column?
>>   ----------
>> - t
>> + f
>>   (1 row)
>>
>> -- leading % matches zero:
>>   SELECT U&'\0061\0308bc' LIKE U&'%\00E4bc' COLLATE ignore_accents;
>>    ?column?
>>   ----------
>> - t
>> + f
>>   (1 row)
>>
>>   -- leading % matches zero (with later %):
>>   SELECT U&'\0061\0308bc' LIKE U&'%\00E4%c' COLLATE ignore_accents;
>>    ?column?
>>   ----------
>> - t
>> + f
>>   (1 row)
>>
>> I think the 1st, 3rd, and 4th failures are all from % not backtracking
>> to match zero chars.
> 
> These are all because of this in like_match.c:
> 
>         * Otherwise, scan for a text position at which we can match the
>         * rest of the pattern.  The first remaining pattern char is known
>         * to be a regular or escaped literal character, so we can compare
>         * the first pattern byte to each text byte to avoid recursing
>         * more than we have to.  [...]
> 
> This shortcut doesn't work with nondeterministic collations, so we have 
> to recurse in any case here.  I have fixed that in the new patch; these 
> test cases work now.
> 
>> The 2nd failure I'm not sure about. Maybe my expectation is wrong, but
>> then why does the same test pass with __ leading not trailing? Surely
>> they should be consistent.
> 
> The question is why is
> 
> SELECT U&'cb\0061\0308' LIKE U&'cb_' COLLATE ignore_accents;  -- false
> 
> but
> 
> SELECT U&'\0061\0308bc' LIKE U&'_bc' COLLATE ignore_accents;  -- true
> 
> The second one matches because
> 
>      SELECT U&'\0308bc' = 'bc' COLLATE ignore_accents;
> 
> So the accent character will be ignored if it's adjacent to another 
> fixed substring in the pattern.