From: | Matteo Beccati <php(at)beccati(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: mailing list archiver chewing patches |
Date: | 2010-02-13 12:34:56 |
Message-ID: | 4B769C70.8060106@beccati.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-www |
On 01/02/2010 17:28, Tom Lane wrote:
> Matteo Beccati<php(at)beccati(dot)com> writes:
>> My main concern is that we'd need to overcomplicate the thread detection
>> algorithm so that it better deals with delayed messages: as it currently
>> works, the replies to a missing message get linked to the
>> "grand-parent". Injecting the missing message afterwards will put it at
>> the same level as its replies. If it happens only once in a while I
>> guess we can live with it, but definitely not if it happens tens of
>> times a day.
>
> That's quite common unfortunately --- I think you're going to need to
> deal with the case. Even getting a direct feed from the mail relays
> wouldn't avoid it completely: consider cases like
>
> * A sends a message
> * B replies, cc'ing A and the list
> * B's reply to list is delayed by greylisting
> * A replies to B's reply (cc'ing list)
> * A's reply goes through immediately
> * B's reply shows up a bit later
>
> That happens pretty frequently IME.
I've improved the threading algorithm by keeping an ordered backlog of
unresolved references, i.e. when a message arrives:
1. Search for a parent message using:
1a. In-Reply-To header. If referenced message is not found insert its
Message-Id to the backlog table with position 0
1b. References header. For each missing referenced message insert its
Message-Id to the backlog table with position N
1c. MS Exchange Thread-Index and Thread-Topic headers
2. Message is stored along with its parent ID, if any.
3. Compare the Message-Id header with the backlog table. Update the
parent field of any referencing message and clean up positions >= n in
the references table.
Now I just need some time to do a final clean up and I'd be ready to
publish the code, which hopefully will be clearer than my words ;)
Cheers
--
Matteo Beccati
Development & Consulting - http://www.beccati.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2010-02-13 14:32:31 | Re: Package namespace and Safe init cleanup for plperl [PATCH] |
Previous Message | Tim Bunce | 2010-02-13 10:17:55 | Re: Package namespace and Safe init cleanup for plperl [PATCH] |
From | Date | Subject | |
---|---|---|---|
Next Message | Thom Brown | 2010-02-22 08:59:37 | PGSQL_Announce spamming Twitter via identi.ca |
Previous Message | Greg Sabino Mullane | 2010-02-11 16:28:51 | Re: Versions RSS page is missing version(s) |