Re: No easy way to join discussion in existing thread when not subscribed

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Amir Rohan <amir(dot)rohan(at)mail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL www <pgsql-www(at)postgresql(dot)org>, magnus(at)hagander(dot)net, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: No easy way to join discussion in existing thread when not subscribed
Date: 2015-09-29 17:44:18
Message-ID: 560ACDF2.8030308@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On 09/29/2015 05:51 PM, Amir Rohan wrote:
> Magnus,

Hi Amir!

>
> Please, see attached patch adding a "whole thread[as mbox] link"
> to pgarchives, by popular request upthread.

Thanks a lot for the patch - I took a quick look at the patch and have a
few comments to make..

>
> It checks hiddenstatus, but does materialize the entire raw thread
> in memory (including attachments) to form the response, which
> is unbounded in principle and can be sizable in practice.
>
> Perhaps django can do streaming requests, so we can bound
> the memory usage + timeout. It's been a while.

this is dangerous - the box we are running on has limited
resources(especially RAM) and we have already seen bots crawling our
archives causing issues.
We do have frontend caching for the archives but a large thread is
likely way too big to be cached - and while it is HTTP-BASIC protected
we have seens various browser plugins (like ones doing "intelligent
prefetching" or other weird things causing issues).
Have you done any (approximate) measurements on what the additional
in-memory overhead in both pg (to build the response) and in django is
compared to the resulting mbox?

>
> If you'd like changes (hard limits, strip attachments, etc'),
> do let me know.

some other things:

* while this is a preexisting issue in the code (most of the http auth
requests are handled directly in lighttpd so nobody noticed so far i
guess) please use "Please authenticate with user 'archives' and
'password' antispam"
* have you verified that the resulting mbox actually contains the
newline seperator after each message(I have not checked whether the
source data has it)?
* are you sure that using unicode() for building the output is going to
work on all input? - I dont think you can assume that the source data is
ASCII clean and/or has only valid unicode code points for mapping

Stefan
>
> On 09/29/2015 03:51 PM, Stephen Frost wrote:
>
>>>> I have this frustration with mutt. :) Which is why I'd like an mbox of
>>>> just the thread that I want to reply to.
>>>
>
> Here you go.
>
>>> I'd be glad to help implement whatever works, if someone points me
>>> at the code for the website.
>>
>> I assume you're referring to the above changes to the archives system,
>> not the bug tracker debate, so, the archives code is here:
>>
>> http://git.postgresql.org/gitweb/?p=pgarchives.git;a=summary
>>
>> The website code is here:
>>
>> http://git.postgresql.org/gitweb/?p=pgweb.git;a=summary
>>
>
> I'd be glad to help with the bugtracker too, if things have converged,
> I haven't followed that thread closely.
>
>
> Cheers,
> Amir
>
> p.s.
>
> The web archives already support a single-message mbox download.
> It's the "raw" view. I didn't realize and... *neither did any of you*
> ;)
>
>
>
>

In response to

Browse pgsql-www by date

  From Date Subject
Next Message Amir Rohan 2015-09-29 19:34:38 Re: No easy way to join discussion in existing thread when not subscribed
Previous Message Stephen Frost 2015-09-29 17:32:19 Re: No easy way to join discussion in existing thread when not subscribed