Quick Links

Re: BUG #15548: Unaccent does not remove combining diacritical characters

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Ramanarayana <raam(dot)soft(at)gmail(dot)com>
Cc:	Hugh Ranalli <hugh(at)whtc(dot)ca>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date:	2019-02-12 04:18:19
Message-ID:	20190212041819.GK1475@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs pgsql-hackers

On Tue, Feb 12, 2019 at 02:27:31AM +0530, Ramanarayana wrote:
> I tested the script in python 2.7 and it works perfect. The problem is in
> python 3.7(and may be only in windows as you were not getting the issue)
> and I was getting the following error
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u0100' in
> position 0: character maps to <undefined>
>
> I went through the python script and found that the stdout encoding is set
> to utf-8 only if python version is <=2.
>
> I have made the same change for python version 3 as well. Please find the
> patch for the same.Let me know if it makes sense

Isn't that because Windows encoding becomes cp1252, utf16 or such?
FWIW, on Debian SID with Python 3.7, I get the correct output, and no
diffs on HEAD. Perhaps it would make sense to use open() on the
different files with encoding='utf-8' to avoid any kind of problems?
--
Michael

In response to

Re: BUG #15548: Unaccent does not remove combining diacritical characters at 2019-02-11 20:57:31 from Ramanarayana

Responses

Re: BUG #15548: Unaccent does not remove combining diacritical characters at 2019-02-12 13:54:20 from Ramanarayana

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Saul, Jean Paolo	2019-02-12 04:32:05	Re: BUG #15609: synchronous_commit=off insert performance regression with secondary indexes
Previous Message	Nitesh Yadav	2019-02-12 03:10:52	Data loss when reading the data from logical replication slot

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2019-02-12 04:23:00	Re: [PATCH] xlogreader: do not read a file block twice
Previous Message	Michael Paquier	2019-02-12 04:12:13	Re: Connection slots reserved for replication