Re: [OT] Tom's/Marc's spam filters?

From: "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org>, Will Trillich <will(at)serensoft(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: [OT] Tom's/Marc's spam filters?
Date: 2004-04-24 12:47:31
Message-ID: 20040424094703.A42925@ganymede.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 23 Apr 2004, Joe Conway wrote:

> Marc G. Fournier wrote:
> > On Mon, 19 Apr 2004, Joe Conway wrote:
> >>Marc G. Fournier wrote:
> >>>Huh? I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all
> >>>enabled ...
> >>
> >>I use exactly the same setup. But recently I've noticed that the
> >>spammers are getting smarter -- I think 20% of it is slipping by the
> >>filters. I'm going to need something better.
> >
> > do you force learn those spam that get through the cracks? I get about 20
> > or 30 messages that slip through the cracks, which I process through with
> > sa-learn nightly ...
>
> Sorry to drag this OT thread on even longer, but it seems to be a topic
> many are interested in ;-)
>
> I wanted to report back that after just 2 days of forced (supervised)
> learning, the bayesian filter is now nailing about 99% of all spam.
> *Many, many, thanks* for the suggestion.
>
> But I wonder why the autolearn feature is so conservative? At this point
> I'm getting lots of stuff like this:
>
> X-Spam-Status: Yes, hits=5.8 required=2.5 tests=BAYES_99,HTML_FONT_BIG,
> HTML_MESSAGE autolearn=no version=2.63
> X-Spam-Report:
> * 0.1 HTML_MESSAGE BODY: HTML included in message
> * 0.3 HTML_FONT_BIG BODY: HTML has a big font
> * 5.4 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
> * [score: 1.0000]
>
> Notice that, even though I get a hit on BAYES_99, I still get
> autolearn=no. Ah well, I guess I should be asking that question of the
> SpamAssassin guys. Also notice that this sucker would have gotten
> through with a score of only 0.4 had it not been for the bayesian filter.

BAYES_99 means that its already been found in the bayes filter, so why
would it once more autolearn it? :)

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy(at)hub(dot)org Yahoo!: yscrappy ICQ: 7615664

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Oleg Bartunov 2004-04-24 14:01:48 contrib/trgm beta release
Previous Message Alvar Freude 2004-04-24 09:33:52 Re: [OT] Tom's/Marc's spam filters?