From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Cees van Zeeland <cees(dot)van(dot)zeeland(at)freedom(dot)nl>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18362: unaccent rules and Old Greek text |
Date: | 2024-05-15 07:01:10 |
Message-ID: | 1bcd13b7-6e00-4de1-961e-b7669f05a2da@eisentraut.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 14.05.24 16:51, Robert Haas wrote:
> 2. The question of which mappings we actually ought to be adding seems
> a lot harder, because it's not altogether clear what it means to
> "remove an accent". The proposed patch adds a whole lot of rules that
> turn tiny little characters into full-sized characters, boldfaced
> and/or italicized and/or otherwise-fancily-printed characters into
> full-sized characters. Only a handful of the changes are actually
> adding rules that specifically*remove an accent*, but there are
> similar rules that already exist, like turning ⅐ into the
> four-character sequence " 1/7" and blocky-looking versions of each
> letter into standard versions and ㍱ into the three-character sequence
> "hPa". So my naive guess would be that we want all of these rules,
> even though you would not guess from the unaccent documentation that
> it's supposed to do stuff like this.
unaccent actually does both accent removal and ligature expansion.
(This is documented.) The cases you show above are ligature expansions.
You can also run generate_unaccent_rules.py with --no-ligatures and then
you get a smaller list that indeed looks more like just accent removal.
It does look like that whatever it thinks a ligature is has some
unintuitive results.
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2024-05-15 07:03:12 | BUG #18467: postgres_fdw (deparser) ignores LimitOption |
Previous Message | Peter Eisentraut | 2024-05-15 06:45:27 | Re: BUG #18362: unaccent rules and Old Greek text |