Re: BUG #18362: unaccent rules and Old Greek text

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Cees van Zeeland <cees(dot)van(dot)zeeland(at)freedom(dot)nl>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18362: unaccent rules and Old Greek text
Date: 2024-05-21 12:04:59
Message-ID: e38dd877-3e76-47bd-8fa5-f079637c5616@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 18.05.24 11:36, Thomas Munro wrote:
>>> WARNING: duplicate source strings, first one will be used
>>>
>>> so it will need to adjustments in how the rules are produced.
>>
>> OK. Does anyone want to look into that?
>
> I think the problem is that the new "simple redirection" rule from the
> Unicode database produces some values that are also present in
> Latin-ASCII.xml, and these are all tolerated as long as the "from" and
> "to" strings both match, because we uniquify them as pairs. But there
> is one pair where the "to" string is different, resulting in this
> clash:
>
> ℌ x
> ℌ H
>
> I think the first line might actually be a bug in CLDR data. I dunno,
> but this just doesn't look right:
>
> ℌ → x ; # 210C;BLACK-LETTER CAPITAL H (compat)
>
> And in the tests I now see that Michael had already figured that out!
> I've included a kludge to remove that. Someone should file a ticket with CLDR.

Done: https://unicode-org.atlassian.net/browse/CLDR-17656

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Sandeep Thakkar 2024-05-21 14:15:38 Re: Postgresql 16.3 installation error (setup file) on Windows 11
Previous Message PG Bug reporting form 2024-05-21 11:44:47 BUG #18473: Problems deployment postgresql for windows