From: | Hugh Ranalli <hugh(at)whtc(dot)ca> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Daniel Verite <daniel(at)manitou-mail(dot)org>, thomas(dot)munro(at)enterprisedb(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
Date: | 2018-12-15 21:03:33 |
Message-ID: | CAAhbUMMmXnj0YSD+fr5hSqeC+D6PAG+0kXJwMMhK2DCdwQVoxQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Sat, 15 Dec 2018 at 14:05, Hugh Ranalli <hugh(at)whtc(dot)ca> wrote:
> On Sat, 15 Dec 2018 at 13:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Hm. Something funny is going on here. When I fetch the two reference
>> files from the URLs cited in the script, and do
>>
>
>> python2 generate_unaccent_rules.py --unicode-data-file UnicodeData.txt
>> --latin-ascii-file Latin-ASCII.xml >newrules
>>
>> I get something that's bit-for-bit the same as what's in unaccent.rules.
>> So there's clearly a platform difference between here and there.
>>
>> I'm using Python 2.6.6, which is what ships with RHEL6; have not tried
>> it on anything newer.
>>
> Well, that's embarrassing. When I looked I couldn't see anything that
> looked platform specific. I'm on Python 2.7.6, which shipped with Mint 17.
> We use other versions of 2.7 on our production platforms. I'll take another
> look, and check the URLs I am using.
>
The problem is that I downloaded the latest version of the Latin-ASCII
transliteration file (r34 rather than the r28 specified in the URL). Over 3
years ago (in r29, of course) they changed the file format (
https://unicode.org/cldr/trac/ticket/5873) so that
parse_cldr_latin_ascii_transliterator loads an empty rules set. I'd be
happy to either a) support both formats, or b), support just the newest and
update the URL. Option b) is cleaner, and I can't imagine why anyone would
want to use an older rule set (then again, struggling with Unicode always
makes my head hurt; I am not an expert on it). Thoughts?
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-12-15 21:20:11 | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
Previous Message | Hugh Ranalli | 2018-12-15 19:05:07 | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-12-15 21:20:11 | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
Previous Message | Vijaykumar Jain | 2018-12-15 20:13:56 | simple query on why a merge join plan got selected |