Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: shailesh(dot)totale(at)sailpoint(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
Date: 2023-11-29 09:13:54
Message-ID: 2c0389dc-a355-4de2-8a70-185b03a4b1e3@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 28.11.23 08:15, PG Bug reporting form wrote:
> PostgreSQL's unaccent module does not use Unicode normalisation, but only a
> simple search-and-replace dictionary. The dictionary, unaccent.rules
> (https://github.com/postgres/postgres/blob/master/contrib/unaccent/unaccent.rules)
> , does not contain these Japanese characters, thus its unable to remove
> the diacritic signs. Can someone please guide when we can expect these
> Japanese characters will be added.
>
> Also tried to check with latest versions of Postgresql still the latest
> version does not have support for the Japanese characters.
>
> https://pgpedia.info/u/unaccent.html

As the subsequent discussion shows, it's not quite clear to everybody
what the exact mandate of the unaccent extension is. Maybe we'll arrive
at some conclusion.

In the meantime, I suggest you also consider solving this with
collations. You might find that those have a more principled approach
to this problem, and they also have a lot of customization capabilities.
The documentation contains examples of accent-insensitive collations
(e.g., [0]). Maybe that will work for you, or serve as the basis for
customization.

[0]:
https://www.postgresql.org/docs/current/collation.html#COLLATION-NONDETERMINISTIC

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Francisco Olarte 2023-11-29 09:15:57 Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
Previous Message Pavel Stehule 2023-11-29 08:45:09 Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'