From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, shailesh(dot)totale(at)sailpoint(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' |
Date: | 2023-11-29 02:40:27 |
Message-ID: | CAMkU=1xvF9NMPJgXTULGYw-5KqH5xduEPDqOT7gvbH2SRWJK-A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Tue, Nov 28, 2023 at 8:06 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Tue, Nov 28, 2023 at 09:58:35AM -0500, Tom Lane wrote:
> > PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> >> PostgreSQL's unaccent module does not use Unicode normalisation, but
> only a
> >> simple search-and-replace dictionary. The dictionary, unaccent.rules
> >> (
> https://github.com/postgres/postgres/blob/master/contrib/unaccent/unaccent.rules
> )
> >> , does not contain these Japanese characters, thus its unable to
> remove
> >> the diacritic signs. Can someone please guide when we can expect these
> >> Japanese characters will be added.
> >
> > unaccent.rules, as distributed, is just an example. It is not meant
> > to be exhaustive or authoritative.
>
> FWIW, I'm quite fluent in Japanese and was discussing a bit this
> around me and, like me, folks were kind of troubled with the concept
> that these should be considered as "accents", because it would
> entirely change the meaning of what each Hiragana and Katakana means.
>
But isn't it generally the case that removing accents might make you land
on a different word with a different meaning?
'ano' and 'año' for example mean different things in Spanish (but unaccent
removes it anyway, at least in one out of four attempts to get the
non-7-bit-ASCII wedged through my terminal and into the function).
That doesn't mean that unaccent is required to do it, of course. But
the possibility of changing the meaning doesn't seem like a reason not to
do it.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | zhihuifan1213 | 2023-11-29 05:26:52 | Re: BUG #18213: Standby's repeatable read isolation level transaction encountered a "nonrepeatable read" problem |
Previous Message | Michael Paquier | 2023-11-29 01:06:02 | Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' |