Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, shailesh(dot)totale(at)sailpoint(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
Date: 2023-11-29 08:12:45
Message-ID: CA+bJJbw6n7Zx2XdmFEGv6dmXCFu6VpVbsfU7whsqkhwk7XCerw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Jeff:

On Wed, 29 Nov 2023 at 03:40, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

I am not going to generally discuss this:
> But isn't it generally the case that removing accents might make you land on a different word with a different meaning?

But this one is a bad example,
> 'ano' and 'año' for example mean different things in Spanish (but unaccent removes it anyway, at least in one out of four attempts to get the non-7-bit-ASCII wedged through my terminal and into the function).

N and Ñ are different letters in spanish. It looks like an accent, can
be typed as such and some unaccent rules in some programs may make
them equal, Ñ is as different from N as it is from Z ( I am spanish,
and in case you want some authority link see
https://www.rae.es/dpd/%C3%B1 ). It has it own pages in the dictionary
( even on paper, I just checked in case my memory fails ).

We used to have also CH and LL as letters, but they were dropped
"recently" ( that meaning this century, I'm getting old ).

On the other "accents", à,è,ì,ò, ù can generally be unaccented w/o
problem, although they may change meaning in some corner cases I do
not remember seen them do that since the special examples in school.
Other thing is ü, which is used on our "special" handling of hard/soft
vowels after g, i.e., you do not pronounce the u in "reguero" ( bot
modify how you pronounce the g, differently from agente ), but in
"agüero" you do pronounce it.

But Ñ is a proper letter, you cannot break it. Our alphabet goes m-n-ñ-o-p-q.

Francisco Olarte.

P.S. to really sound spanish, we would have picked up "cono" for the
examples :-p

FO

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2023-11-29 08:45:09 Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
Previous Message zhihuifan1213 2023-11-29 07:50:00 Re: 回复: BUG #18213: Standby's repeatable read isolation level transaction encountered a "nonrepeatable read" problem