From: | Matteo Beccati <php(at)beccati(dot)com> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: mailing list archiver chewing patches |
Date: | 2010-01-12 20:37:50 |
Message-ID: | 4B4CDD9E.7010204@beccati.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-www |
Il 12/01/2010 21:04, Magnus Hagander ha scritto:
> On Tue, Jan 12, 2010 at 20:56, Matteo Beccati<php(at)beccati(dot)com> wrote:
>> Il 12/01/2010 10:30, Magnus Hagander ha scritto:
>>>
>>> The problem is usually with strange looking emails with 15 different
>>> MIME types. If we can figure out the proper way to render that, the
>>> rest really is just a SMOP.
>>
>> Yeah, I was expecting some, but all the message I've looked at seemed to be
>> working ok.
>
> Have you been looking at old or new messages? Try grabbing a couple of
> MBOX files off archives.postgresql.org from several years back, you're
> more likely to find weird MUAs then I think.
Both. pgsql-hacker and -general are subscribed and getting new emails
and pgsql-www is just an import of the archives:
http://archives.beccati.org/pgsql-www/by/date (sorry, no paging)
(just fixed a 500 error that was caused by the fact that I've been
playing with the db a bit and a required helper table was missing)
>>> (BTW, for something to actually be used In Production (TM), we want
>>> something that uses one of our existing frameworks. So don't go
>>> overboard in code-wise implementations on something else - proof of
>>> concept on something else is always ok, of course)
>>
>> OK, that's something I didn't know, even though I expected some kind of
>> limitations. Could you please elaborate a bit more (i.e. where to find
>> info)?
>
> Well, the framework we're moving towards is built on top of django, so
> that would be a good first start.
>
> There is also whever the commitfest thing is built on, but I'm told
> that's basically no framework.
I'm afraid that's outside on my expertise. But I can get as far as
having a proof of concept and the required queries / php code.
>> Having played with it, here's my feedback about AOX:
>>
>> pros:
>> - seemed to be working reliably;
>> - does most of the dirty job of parsing emails, splitting parts, etc
>> - highly normalized schema
>> - thread support (partial?)
>
> A killer will be if that thread support is enough. If we have to build
> that completely ourselves, it'll take a lot more work.
Looks like we need to populate a helper table with hierarchy
information, unless Ahijit has a better idea and knows how to get it
from the aox main schema.
>> cons:
>> - directly publishing the live email feed might not be desirable
>
> Why not?
The scenario I was thinking at was the creation of a static snapshot and
potential inconsistencies that might occur if the threads get updated
during that time.
>> - queries might end up being a bit complicate for simple tasks
>
> As long as we don't have to hit them too often, which is solve:able
> with caching. And we do have a pretty good RDBMS to run the queries on
> :)
True :)
>>> I don't think you can trust the NNTP gateway now or in the past,
>>> messages are sometimes lost there. The mbox files are as complete as
>>> anything we'll ever get.
>>
>> Importing the whole pgsql-www archive with a perl script that bounces
>> messages via SMTP took about 30m. Maybe there's even a way to skip SMTP, I
>> haven't looked into it that much.
>
> Um, yes. There is an MBOX import tool.
Cool.
>> With all that said, I can't promise anything as it all depends on how much
>> spare time I have, but I can proceed with the evaluation if you think it's
>> useful. I have a feeling that AOX is not truly the right tool for the job,
>> but we might be able to customise it to suit our needs. Are there any other
>> requirements that weren't specified?
>
> Well, I think we want to avoid customizing it. Using a custom
> frontend, sure. But we don't want to end up customizing the
> parser/backend. That's the road to unmaintainability.
Sure. I guess my wording wasn't right... I was more thinking about
adding new tables, materialized views or whatever else might be missing
to make it fit out purpose.
Cheers
--
Matteo Beccati
Development & Consulting - http://www.beccati.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2010-01-12 20:39:00 | Re: Streaming replication status |
Previous Message | Aidan Van Dyk | 2010-01-12 20:16:47 | Re: mailing list archiver chewing patches |
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2010-01-12 21:28:03 | Re: mailing list archiver chewing patches |
Previous Message | Aidan Van Dyk | 2010-01-12 20:16:47 | Re: mailing list archiver chewing patches |