Re: BUG #18362: unaccent rules and Old Greek text

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Cees van Zeeland <cees(dot)van(dot)zeeland(at)freedom(dot)nl>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18362: unaccent rules and Old Greek text
Date: 2024-05-14 15:45:35
Message-ID: 45150ad278a8bbb1b51ec02da991153bd2277e1f.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, 2024-05-14 at 10:51 -0400, Robert Haas wrote:
> The question of which mappings we actually ought to be adding seems
> a lot harder, because it's not altogether clear what it means to
> "remove an accent". The proposed patch adds a whole lot of rules that
> turn tiny little characters into full-sized characters, boldfaced
> and/or italicized and/or otherwise-fancily-printed characters into
> full-sized characters. Only a handful of the changes are actually
> adding rules that specifically *remove an accent*, but there are
> similar rules that already exist, like turning ⅐ into the
> four-character sequence " 1/7" and blocky-looking versions of each
> letter into standard versions and ㍱ into the three-character sequence
> "hPa". So my naive guess would be that we want all of these rules,
> even though you would not guess from the unaccent documentation that
> it's supposed to do stuff like this. But my knowledge of languages
> other than English is very limited, and I am not a user of unaccent
> and never have been, so I am reluctant to make grand pronouncements.
> Does anyone more knowledgeable want to opine?

I am not necessarily more knowledgeable, but I'll opine anyway.

As a German speaker, I wouldn't call the dieresis on "ü" an
accent like the French é, è or ê, even though the current
implementation of unaccent() turns it into an "u".

And while most people would agree that the caret on â is an
accent in the French language, I am not sure if it is the same
in Vietnamese.

And I cannot see how ⅐ could be considered an accent...

Perhaps if we invent a function called convert_to_ascii() or so
instead of shoving that into unaccent(), it would make more sense.

Yours,
Laurenz Albe

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-05-14 15:55:18 Re: BUG #18464: Replacing a SQL function silently drops the generated columns that use this function
Previous Message Muhammad Waqas 2024-05-14 15:42:46 Re: BUG #18464: Replacing a SQL function silently drops the generated columns that use this function