Re: Further UTF8/MIME fixes for the commitfest app

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>
Cc: PostgreSQL WWW list <pgsql-www(at)postgresql(dot)org>
Subject: Re: Further UTF8/MIME fixes for the commitfest app
Date: 2017-04-01 16:53:39
Message-ID: CABUevEz_16+ipbRT99TaBaF7iGM+a+1FZv4PP2LikKoxwkNcxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On Sun, Mar 19, 2017 at 2:45 PM, Magnus Hagander <magnus(at)hagander(dot)net>
wrote:

>
>
> On Tue, Mar 14, 2017 at 2:07 PM, Dagfinn Ilmari Mannsåker <
> ilmari(at)ilmari(dot)org> wrote:
>
>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>
>> > On Wed, Mar 1, 2017 at 5:35 PM, Dagfinn Ilmari Mannsåker <
>> ilmari(at)ilmari(dot)org>
>> > wrote:
>> […]
>> >> #2 MIME-decodes headers received from the mailing list archive JSON API
>> >>
>> >> I haven't been able to talk to the JSON api, so I couldn't test them
>> >> properly, but I did some stand-alone testing of the code snippets.
>> >>
>> >> Note that the MIME decoding only works properly if running under Python
>> >> 3; the Python 2 version of email.header.decode_header() has broken
>> >> detection of the end of encoded-words.
>> >
>> > Is the patch still an improvement on python2?
>>
>> No, because it'd be affected by the same problem that causes the
>> undecoded headers to be returned from the archive app.
>>
>
> OK.
>
>
>
>> > Also, based on your other email about the list archives -- if we fix
>> this
>> > in the archives, does that make this patch unnecessary?
>>
>> Yes, this patch is unnecessary if the archive app is fixed, and
>> insufficient if the commitfest app isn't upgraded to python3.
>>
>> One possible workaround until upgrading to python3 is feasible would be
>> for the archive app to do some more munging (akin to the existing
>> _re_mailworkaround), and inject a space between an encoded-word and an
>> immediately-adjacent opening/closing paren.
>>
>
> Actually, if I read that one right, it would be enough to upgrade the
> *loader* part of the archives, which is a much more contained problem, as
> it pretty much only has dependencies on the standard library.
>
> Will have to run some detailed tests on that of course, to make sure it
> doesn't break anything else (like we have to reparse the 1.2 million
> messages in the archives and see if something else changes - but we have
> tools for this), but I think that's probably the best way forward from here.
>
>
I took a look at this, but it's not a lot of fun.

We currently use utidylib to clean HTML. This one only supports Python 2.

We could move to tidylib (notably without the u), which uses newer versions
of everything and exists for python3. But the Python 3 version is not
available until debian stretch.

We'd also have to carefully examine the difference from using tidylib vs
utidylib, and should probably do that as a separate step. I guess we'll
have to start there.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Browse pgsql-www by date

  From Date Subject
Next Message Magnus Hagander 2017-04-02 13:37:02 Re: Searching for pgweb
Previous Message Magnus Hagander 2017-04-01 15:00:48 Re: Please provide editor privileges for postgresql wiki