From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | hugh(at)whtc(dot)ca |
Cc: | raam(dot)soft(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com |
Subject: | Re: Unaccent extension python script Issue in Windows |
Date: | 2019-03-18 06:27:55 |
Message-ID: | 20190318.152755.73288474.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello.
At Mon, 18 Mar 2019 14:13:34 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190318(dot)141334(dot)186469242(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Hello.
>
> At Sun, 17 Mar 2019 20:23:05 -0400, Hugh Ranalli <hugh(at)whtc(dot)ca> wrote in <CAAhbUMNoBLu7jAbyK5MK0LXEyt03PzNQt_Apkg0z9bsAjcLV4g(at)mail(dot)gmail(dot)com>
> > Hi Ram,
> > Thanks for doing this; I've been overestimating my ability to get to things
> > over the last couple of weeks.
> >
> > I've looked at the patch and have made one minor change. I had moved all
> > the imports up to the top, to keep them in one place (and I think some had
> > originally been used only by the Python 2 code. You added them there, but
> > didn't remove them from their original positions. So I've incorporated that
> > into your patch, attached as v2. I've tested this under Python 2 and 3 on
> > Linux, not Windows.
>
> Though I'm not sure the necessity of running the script on
> Windows, the problem is not specific for Windows, but general one
> that haven't accidentially found on non-Windows environment.
>
> On CentOS7:
> > export LANG="ja_JP.EUCJP"
> > python <..snipped..>
> ..
> > UnicodeEncodeError: 'euc_jp' codec can't encode character '\xab' in position 0: illegal multibyte sequence
>
> So this is not an issue with Windows but with python3.
>
> The script generates identical files with the both versions of
> python with the pach on Linux and Windows 7. Python3 on Windows
> emits CRLF as a new line but it doesn't seem to harm. (I didn't
> confirmed that due to extreme slowness of build from uncertain
> reasons now..)
I confirmed that CRLF actually doesn't harm and unaccent works
correctly. (t_isspace() excludes them as white space).
> This patch contains irrelevant changes. The minimal required
> change would be the attached. If you want refacotor the
> UnicodeData reader or rearrange import sutff, it should be
> separate patches.
>
> It would be better use IOBase for Python3 especially for stdout
> replacement but I didin't since it *is* working.
>
> > Everything else looks correct. I apologise for not having replied to your
> > question in the original bug report. I had intended to, but as I said,
> > there's been an increase in the things I need to juggle at the moment.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2019-03-18 06:32:20 | Re: Data-only pg_rewind, take 2 |
Previous Message | Stephen Frost | 2019-03-18 06:25:31 | Re: Google Summer of Code |