From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | hugh(at)whtc(dot)ca |
Subject: | BUG #15548: Unaccent does not remove combining diacritical characters |
Date: | 2018-12-12 20:00:45 |
Message-ID: | 15548-cef1b3f8de190d4f@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
The following bug has been logged on the website:
Bug reference: 15548
Logged by: Hugh Ranalli
Email address: hugh(at)whtc(dot)ca
PostgreSQL version: 11.1
Operating system: Ubuntu 18.04
Description:
Apparently Unicode has two ways of accenting a character: as a separate code
point, which represents the base character and the accent, or as a
"combining diacritical mark"
(https://en.wikipedia.org/wiki/Combining_Diacritical_Marks) in which case
the mark applies itself to the preceding character. For example, A followed
by U+0300 displays À. However, unaccent is not removing these accents.
SELECT unaccent(U&'A\0300'); should result in 'A', but instead results in
'À.' I'm running PostgreSQL 11.1, installed from the PostgreSQL
repositories. I've read bug report #13440, and have tried with both the
installed unaccent.rules as well as a new set generated by the
generate_unaccent_rules.py distributed with the 11.1 source code:
wget http://unicode.org/Public/7.0.0/ucd/UnicodeData.txt
wget
https://www.unicode.org/repos/cldr/trunk/common/transforms/Latin-ASCII.xml
python generate_unaccent_rules.py --unicode-data-file UnicodeData.txt
--latin-ascii-file Latin-ASCII.xml > unaccent.rules
I see there have been some updates to generate_unaccent_rules.py to handle
Greek and Vietnamese characters, but neither of them seem to address this
issue. I'm happy to contribute a patch to handle these cases, but of course
wanted to make sure this is desired behaviour, or if I am misunderstanding
something somewhere.
Thank you,
Hugh Ranalli
From | Date | Subject | |
---|---|---|---|
Next Message | Matteo | 2018-12-12 20:44:55 | Re: Fwd: BUG #15547: default timezone on servers running while time changed from PDT to PST reverting to UTC. |
Previous Message | Stuart | 2018-12-12 20:00:22 | Errors creating partitioned tables from existing using (LIKE <table>) after renaming table constraints |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-12-12 20:41:54 | Minimal logical decoding on standbys |
Previous Message | Tom Lane | 2018-12-12 20:00:03 | Bogus EPQ plan construction in postgres_fdw |