From: | Aidan Van Dyk <aidan(at)highrise(dot)ca> |
---|---|
To: | Matteo Beccati <php(at)beccati(dot)com> |
Cc: | Magnus Hagander <magnus(at)hagander(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: mailing list archiver chewing patches |
Date: | 2010-01-12 20:16:47 |
Message-ID: | 20100112201647.GC18076@oak.highrise.ca |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-www |
I'll note that the whole idea of a "email archive" interface might be a
very good "advocacy" project as well. AOX might not be a perfect fit,
but it could be a good learning experience... Really, all the PG mail
archives need is:
1) A nice normalized DB schema representing mail messages and their
relations to other message and "recipients" (or "folders")
2) A "injector" that can parse an email message, and de-compose it into
the various parts/tables of the DB schema, and insert it
3) A nice set of SQL queries to return message, parts, threads,
folders based on $criteria (search, id, folder, etc)
4) A web interface to view the messages/thread/parts #3 returns
The largest part of this is #1, but a good schema would be a very good
candidate to show of some of PG's more powerful features in a way that
"others" could see (like the movie store sample somewhere) , such as:
1) full text search
2) text vs bytea handling (thinking of all the mime parts, and encoding,
etc)
3) CTEs, ltree, recursion, etc, for threading/searching
4) Triggers for "materialized views" (for quick threading/folder queries)
5) expression indexes
a.
* Matteo Beccati <php(at)beccati(dot)com> [100112 14:56]:
> Having played with it, here's my feedback about AOX:
>
> pros:
> - seemed to be working reliably;
> - does most of the dirty job of parsing emails, splitting parts, etc
> - highly normalized schema
> - thread support (partial?)
>
> cons:
> - directly publishing the live email feed might not be desirable
> - queries might end up being a bit complicate for simple tasks
> - might be not easy to add additional processing in the workflow
> If there isn't a fully usable thread hierarchy I was more thinking to
> ltree, mainly because I've successfully used it in past and I haven't
> had enough time yet to look at CTEs. But if performance is comparable I
> don't see a reason why we shouldn't use them.
> With all that said, I can't promise anything as it all depends on how
> much spare time I have, but I can proceed with the evaluation if you
> think it's useful. I have a feeling that AOX is not truly the right tool
> for the job, but we might be able to customise it to suit our needs. Are
> there any other requirements that weren't specified?
--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.
From | Date | Subject | |
---|---|---|---|
Next Message | Matteo Beccati | 2010-01-12 20:37:50 | Re: mailing list archiver chewing patches |
Previous Message | Bruce Momjian | 2010-01-12 20:11:54 | Re: Streaming replication status |
From | Date | Subject | |
---|---|---|---|
Next Message | Matteo Beccati | 2010-01-12 20:37:50 | Re: mailing list archiver chewing patches |
Previous Message | Magnus Hagander | 2010-01-12 20:04:27 | Re: mailing list archiver chewing patches |