Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Francisco Olarte <folarte(at)peoplecall(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, shailesh(dot)totale(at)sailpoint(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
Date: 2023-11-29 10:26:23
Message-ID: CAFj8pRAkwKfYTO0fgg0AkkNjf8Q5FFiM-iEr=7+g+HwvPVTu5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

st 29. 11. 2023 v 10:16 odesílatel Francisco Olarte <folarte(at)peoplecall(dot)com>
napsal:

> Hi Pavel.
>
> On Wed, 29 Nov 2023 at 09:45, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
> > st 29. 11. 2023 v 9:13 odesílatel Francisco Olarte <
> folarte(at)peoplecall(dot)com> napsal:
> ...
> >> But Ñ is a proper letter, you cannot break it. Our alphabet goes
> m-n-ñ-o-p-q.
> > Some users use unaccent for transformation to 7bit ASCII.
>
> Right, I've done it manually sometimes. But I did not normaly just
> supress the ~ , I turned año to anno ( IIRC nn was the predecessor of
> Ñ, and it is used in similar place like "Anno domini" ) or to agno (
> which sounds similar in French, and in things like "agnus dei qui
> tollit pecata mundi" ( although that one has a much different meanig )
> ).
>

Š, S, Ž, Z are different chars, different sounds - some languages use two
chars for these sounds
https://www.optilingo.com/blog/polish/everything-about-polish-language/ Polish
Digraphs and Trigraphs.

>
> I was trying that normally you can supress tildes in spanish without
> much problem, like in aviòn. Most of them just marks how to pronounce
> them, they are useful if you do not know the word, but useless if you
> know it. Some of them are used to differentiate things like adverbs
> and pronoums, but in this case you can deduce it from the whole
> phrase. But not with n/ñ. ñoño and nono are completely different and
> unrelated words, and they even go in different "chapters" of the
> dictionary.
>
> > In the Czech language I can find more examples, where removing
> diacritics means significant loss and the meaning of the world should be
> based only on context.
> ...
> That seems even more complex than French, and I've never been able to
> cope with them!
> > And for unaccent we expected this loss.
> > So my question is, can the unaccent function be used for transformation
> to 7bit ASCII or is it wrong usage?
>
> You may need to turn chars to sequences.
>

In Czech language we don't do it - probably nobody can read it. We are
trained to read it just without an accent. Lot of people write it usually,
because it uses keywords without Czech chars, and for Czech language it is
not too big a problem. Maybe it is wrong except for other languages, but we
do it.

Pavel

> Francisco Olarte,.
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Laurenz Albe 2023-11-29 10:58:43 Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
Previous Message Francisco Olarte 2023-11-29 09:15:57 Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'