Quick Links

Re: BUG #18362: unaccent rules and Old Greek text

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, cees(dot)van(dot)zeeland(at)freedom(dot)nl, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #18362: unaccent rules and Old Greek text
Date:	2024-02-25 23:59:37
Message-ID:	1667235.1708905577@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Michael Paquier <michael(at)paquier(dot)xyz> writes:
> On Mon, Feb 26, 2024 at 12:15:57PM +1300, Thomas Munro wrote:
>> That has a normal looking sequence that we can understand (α + an
>> accent). If I tell the script to follow such "simple" redirections, I
>> get over a thousand new mappings, including those. See attached.
>> There is probably more correct terminology that I'm using here...

> Ah, you've beaten me to it. Yes, that's pretty much the impression I
> was getting while looking at the set of characters in Unicode.txt. I
> am not entirely sure if what you are doing is the best way to do it,
> but the set of characters generated in unaccent.rules makes sense
> here. I am surprised to see that many, TBH.

There are only about 1650 lines in our standard unaccent.rules
file today. Are we concerned about adding so many more?
I don't think the trie lookup logic would be slowed any,
but the time to load the rules file might take a hit.

regards, tom lane

In response to

Re: BUG #18362: unaccent rules and Old Greek text at 2024-02-25 23:25:36 from Michael Paquier

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Andrei Lepikhov	2024-02-26 02:52:59	Re: "type with xxxx does not exist" when doing ExecMemoize()
Previous Message	Michael Paquier	2024-02-25 23:25:36	Re: BUG #18362: unaccent rules and Old Greek text