From: | Ramanarayana <raam(dot)soft(at)gmail(dot)com> |
---|---|
To: | Hugh Ranalli <hugh(at)whtc(dot)ca> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
Date: | 2019-02-11 20:57:31 |
Message-ID: | CAKm4Xs7CBuCW_XQtrVX6ThwSMiL0WK7Cj3nZx2Jymb9eJ=YdMQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Hi Hugh,
I tested the script in python 2.7 and it works perfect. The problem is in
python 3.7(and may be only in windows as you were not getting the issue)
and I was getting the following error
UnicodeEncodeError: 'charmap' codec can't encode character '\u0100' in
position 0: character maps to <undefined>
I went through the python script and found that the stdout encoding is set
to utf-8 only if python version is <=2.
I have made the same change for python version 3 as well. Please find the
patch for the same.Let me know if it makes sense
Regards,
Ram.
On Tue, 12 Feb 2019 at 00:50, Hugh Ranalli <hugh(at)whtc(dot)ca> wrote:
>
> On Sun, 10 Feb 2019 at 15:07, raam narayana <raam(dot)soft(at)gmail(dot)com> wrote:
>
>> Hi,
>>
>> After the latest commit in master branch, I was trying to test the python
>> script. Ironically I still see that the output from the script is
>> completely different from the unaccent.rules file content. Am I missing
>> anything.My testing includes the following
>>
>> Downloaded the following files
>>
>> http://unicode.org/Public/8.0.0/ucd/UnicodeData.txt
>>
>>
>> http://unicode.org/cldr/trac/export/14746/tags/release-34/common/transforms/Latin-ASCII.xml
>>
>> Executed the below python script
>>
>> python generate_unaccent_rules.py --unicode-data-file UnicodeData.txt
>> --latin-ascii-file Latin-ASCII.xml > unaccent.rules
>>
>> I am using python 3.7.1 and running on Windows 10 Platform
>>
>> The new status of this patch is: Needs review
>>
>
> Hi Raam,
> I just ran generate_unaccent_rules.py under two environments, using the
> data files given above :
> - Python 3.4.3 on Linux Mint 17.3 (equivalent to Ubuntu 14.04)
> - Python 3.6.7 on Ubuntu 18.04
>
> In both cases, the output was identical to that generated by the program
> under Python 2.7. So yes, more information would help. Unfortunately I
> don't have a Windows Python environment readily available, but could set
> one up if I had to.
>
> Thanks,
> Hugh
>
--
Cheers
Ram 4.0
Attachment | Content-Type | Size |
---|---|---|
generate_unaccent_rules-remove-combining-diacritical-accents-03.patch | application/octet-stream | 544 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Saul, Jean Paolo | 2019-02-12 00:27:49 | Re: BUG #15609: synchronous_commit=off insert performance regression with secondary indexes |
Previous Message | Tom Lane | 2019-02-11 20:33:08 | Re: BUG #15631: Generated as identity field in a temporary table with on commit drop corrupts system catalogs |
From | Date | Subject | |
---|---|---|---|
Next Message | Julien Rouhaud | 2019-02-11 20:59:59 | Re: Inadequate executor locking of indexes |
Previous Message | Peter Geoghegan | 2019-02-11 20:54:30 | Re: Making all nbtree entries unique by having heap TIDs participate in comparisons |