Re: mailing list archiver chewing patches

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jd <jd(at)commandprompt(dot)com>, Matteo Beccati <php(at)beccati(dot)com>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 18:54:27
Message-ID: 9837222c1001121054j40bc9302obc1123f5f6c02503@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

On Tue, Jan 12, 2010 at 18:34, Dave Page <dpage(at)pgadmin(dot)org> wrote:
> On Tue, Jan 12, 2010 at 10:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>>> On Tue, 2010-01-12 at 10:24 +0530, Dave Page wrote:
>>>> So just to put this into perspective and give anyone paying attention
>>>> an idea of the pain that lies ahead should they decide to work on
>>>> this:
>>>>
>>>> - We need to import the old archives (of which there are hundreds of
>>>> thousands of messages, the first few years of which have, umm, minimal
>>>> headers.
>>>> - We need to generate thread indexes
>>>> - We need to re-generate the original URLs for backwards compatibility
>>>>
>>>> Now there's encouragement :-)
>>
>>> Or, we just leave the current infrastructure in place and use a new one
>>> for all new messages going forward. We shouldn't limit our ability to
>>> have a decent system due to decisions of the past.
>>
>> -1.  What's the point of having archives?  IMO the mailing list archives
>> are nearly as critical a piece of the project infrastructure as the CVS
>> repository.  We've already established that moving to a new SCM that
>> fails to preserve the CVS history wouldn't be acceptable.  I hardly
>> think that the bar is any lower for mailing list archives.
>>
>> Now I think we could possibly skip the requirement suggested above for
>> URL compatibility, if we just leave the old archives on-line so that
>> those URLs all still resolve.  But if we can't load all the old messages
>> into the new infrastructure, it'll basically be useless for searching
>> purposes.
>>
>> (Hmm, re-reading what you said, maybe we are suggesting the same thing,
>> but it's not clear.  Anyway my point is that Dave's first two
>> requirements are real.  Only the third might not be.)
>
> The third actually isn't actually that hard to do in theory. The
> message numbers are basically the zero-based position in the mbox
> file, and the rest of the URL is obvious.

The third part is trivial. The search system already does 95% of it.
I've already implemented exactly that kind of redirect thing on top of
the search code once just as a poc, and it was less than 30 minutes of
hacking. Can't seem to find the script ATM though, but you get the
idea.

Let's not focus on that part, we can easily solve that.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2010-01-12 18:55:54 Re: Writeable CTEs
Previous Message Stefan Kaltenbrunner 2010-01-12 18:48:38 Re: Streaming replication status

Browse pgsql-www by date

  From Date Subject
Next Message Matteo Beccati 2010-01-12 19:56:29 Re: mailing list archiver chewing patches
Previous Message Joshua D. Drake 2010-01-12 17:38:13 Re: mailing list archiver chewing patches