Spam filter + evolution

List overview All Threads
Download

newer

older

SUSE online update

RE: [ALUG] Booting multiple Linux...

Dennis Dryden

25 Jun 2004 25 Jun '04

7:06 p.m.

I'm getting about 98% spam at the moment and i was wondering if there was a quick and easy way to setup something some spam filtering software(like spam assassin). Will it involve setting up sendmail or anything scary like that?

Dennis

Show replies by date

Wayne Stallwood

25 Jun 25 Jun

7:33 p.m.

New subject: [Alug] Spam filter + evolution

On Friday 25 June 2004 18:04, Dennis Dryden wrote:

...

I'm getting about 98% spam at the moment and i was wondering if there was a quick and easy way to setup something some spam filtering software(like spam assassin). Will it involve setting up sendmail or anything scary like that?

Not an evolution user myself but here is how I do it in kmail

I have spamassassin installed. Then I have two rules for incoming mail in kmail.

First If size is less than or equal to 250000 pipe through spamc

Then If X-Spam-Flag contains YES then move to folder spam and mark as read

If evolution cannot manage the X-Spam-Flag then then you can set up spamassassin to put ****SPAM***** in the subject line and filter for that instead.

Folder spam is set up to purge messages older than 2 weeks, this gives me a chance to check it once and a while for any false positives (very rare)

The size limit is important as spamc can barf on large messages.

You will also have to have a bit of a read up on how to use sa-learn to fine tune spamassassin. Essentially you move the spam messages not automatically detected as such to the spam folder manually then tell sa-learn to look at that folder. Personally this bit is done on my machine by cron, so all I have to do is remember to move any undetected spam I see rather than deleting it.

Of course I am still wasting bandwidth downloading the junk, but at least I haven't got to sieve through it everytime I want to check my mail.

Graham

7:37 p.m.

New subject: [Alug] Spam filter + evolution

On Friday 25 June 2004 19:04, Dennis Dryden wrote:

...

I'm getting about 98% spam at the moment and i was wondering if there was a quick and easy way to setup something some spam filtering software(like spam assassin). Will it involve setting up sendmail or anything scary like that?

Dennis

If you're an 'ordinary' home user the simplest approach is to spam-bin anything not coming from someone in your address book. Quite easy to do if you're using KMail - details on request.

I find Bayesian filters such as SpamAssassin and Bogofilter remove only about 80% of spam (I'm guessing - haven't checked lately), which still leaves a lot. If they can do better than that they're liable to tag wanted mail as spam from time to time, which can be inconvenient because it means wading through the stuff from time to time. Nonetheless it's worth adding one to do some of the grunt work. As with the above, you can add this to KMail on a client machine, or if you run your own mailserver you can do it there.

-- GT

adam＠thebowery.co.uk

8:10 p.m.

New subject: [Alug] Spam filter + evolution

On Fri, Jun 25, 2004 at 07:36:09PM +0100, Graham wrote:

...

On Friday 25 June 2004 19:04, Dennis Dryden wrote:

...
I'm getting about 98% spam at the moment and i was wondering if there was a quick and easy way to setup something some spam filtering software(like spam assassin). Will it involve setting up sendmail or anything scary like that?

What distro are you using? and what mailserver is installed by default on it? also how do you get your email? do you download it via pop3? Imap? or some other method? To answer the question effectively it would be handy to know these things. Although, you have a few options, the first is to integrate it with your mail delivery agent, the second is to run it from a procmail (or similar) configuration when you collect your email and the final option would be to have your mail client run the program when it collects the email (I don't know if your software has this option).

...

If you're an 'ordinary' home user the simplest approach is to spam-bin anything not coming from someone in your address book. Quite easy to do if you're using KMail - details on request.

That sounds a bit crazy to me. Quite often I get email from people who are not in my address book...

...

I find Bayesian filters such as SpamAssassin and Bogofilter remove only about 80% of spam (I'm guessing - haven't checked lately), which still leaves a lot. If they can do better than that they're liable to tag wanted mail as spam from time to time, which can be inconvenient because it means wading through the stuff from time to time. Nonetheless it's worth adding one to do some of the grunt work. As with the above, you can add this to KMail on a client machine, or if you run your own mailserver you can do it there.

I find with a well trained spamassassin I get 0 false positives and perhaps 1 spam a day gets through, I just checked my spam folder and it works out that I get on average 219 spams a day so you are looking at a false positive rate of 0% and a failure rate of less than .5% (this has been the case for more than the 2 years I have been running spamassassin.

What I tend to do is bin anything with a score over 10 in spamassassin straight into the spam folder as I have /never/ had a false positive that high (I could probably lower that value also). Anything with a score between 5-10 gets put into another folder that gets checked once a day. I would say that the best (and probably only) way to install spamassassin is to spend a month collecting and saving *all* of your spam and ham email. Then after this month get spamassassin running and immediately train it on the spam/ham lexicon (my spam lexicon now goes back two years so I can retrain it effectively or any other bayesian anti-spam I wish to try, the current archive is about 100 megs with bzip compression), when I originally ran it without doing this first I will admit that it did capture quite a bit of ham (mainly newsletters and the like, as they contained lots of spammy like features). The other thing you can of course do is white-list things so that they never get eaten if you get problems also.

adam

-- jabberid = quinophex@jabber.earth.li AFFS || http://www.affs.org.uk/ || Not a filesystem

Dennis Dryden

9:28 p.m.

New subject: [Alug] Spam filter + evolution

On Fri, 2004-06-25 at 20:09, adam@thebowery.co.uk wrote:

...

What distro are you using? and what mailserver is installed by default on it? also how do you get your email? do you download it via pop3? Imap? or some other method? To answer the question effectively it would be handy to know these things. Although, you have a few options, the first is to integrate it with your mail delivery agent, the second is to run it from a procmail (or similar) configuration when you collect your email and the final option would be to have your mail client run the program when it collects the email (I don't know if your software has this option).

Sorry i forgot to say, I'm using Debian unstable and collecting the email from my ISP's pop3 server. Evolution doesn't seem to support filtering through procmail(im sure i remember seeing it but maybe that was in Balsa...).

...

...
If you're an 'ordinary' home user the simplest approach is to spam-bin anything not coming from someone in your address book. Quite easy to do if you're using KMail - details on request.

That sounds a bit crazy to me. Quite often I get email from people who are not in my address book...

I don't really use my address book so...

...

...
I find Bayesian filters such as SpamAssassin and Bogofilter remove only about 80% of spam ...

80% seems great to me :)

...

I find with a well trained spamassassin I get 0 false positives and perhaps 1 spam a day gets through, I just checked my spam folder and it works out that I get on average 219 spams a day so you are looking at a false positive rate of 0% and a failure rate of less than .5% (this has been the case for more than the 2 years I have been running spamassassin.

What I tend to do is bin anything with a score over 10 in spamassassin straight into the spam folder as I have /never/ had a false positive that high (I could probably lower that value also). Anything with a score between 5-10 gets put into another folder that gets checked once a day. I would say that the best (and probably only) way to install spamassassin is to spend a month collecting and saving *all* of your spam and ham email. Then after this month get spamassassin running and immediately train it on the spam/ham lexicon (my spam lexicon now goes back two years so I can retrain it effectively or any other bayesian anti-spam I wish to try, the current archive is about 100 megs with bzip compression), when I originally ran it without doing this first I will admit that it did capture quite a bit of ham (mainly newsletters and the like, as they contained lots of spammy like features). The other thing you can of course do is white-list things so that they never get eaten if you get problems also.

I'll start collecting spam then ;) Thanks for the help and pointers.

Dennis

adam＠thebowery.co.uk

26 Jun 26 Jun

12:24 p.m.

New subject: [Alug] Spam filter + evolution

On Fri, Jun 25, 2004 at 09:27:19PM +0100, Dennis Dryden wrote:

...

...
Sorry i forgot to say, I'm using Debian unstable and collecting the email from my ISP's pop3 server. Evolution doesn't seem to support filtering through procmail(im sure i remember seeing it but maybe that was in Balsa...).

An option would be to use a different program to collect your email like getmail or fetchmail and have them pass the mail onto Exim, then you have Exim do the spam checking for you (and you would probably want to integrate anti-virus into this) and then have Exim deliver the mail to an Imap server (dovecot springs to mind) then point Evolution at the Imap server, then if you want to change mail client in the future you don't have to worry about changing mailbox formats.

Although reading the above may sound a bit daunting there are plenty of how-tos online that will tell you how to achieve this (first result I found out of google was http://www.win.tue.nl/~martijna/Debianstuff/ I don't vouch for the quality of the information there). The other thing I would suggest if you went for this approach is to make sure you keep copies of your email on the pop server or take backups of it via a seperate mechanism just in case you break something and lose your email :)

Adam

-- jabberid = quinophex@jabber.earth.li AFFS || http://www.affs.org.uk/ || Not a filesystem

Graham

25 Jun 25 Jun

11:15 p.m.

New subject: [Alug] Spam filter + evolution

On Friday 25 June 2004 20:09, adam@thebowery.co.uk wrote:

...

On Fri, Jun 25, 2004 at 07:36:09PM +0100, Graham wrote:

...
If you're an 'ordinary' home user the simplest approach is to spam-bin anything not coming from someone in your address book. Quite easy to do if you're using KMail - details on request.

That sounds a bit crazy to me. Quite often I get email from people who are not in my address book...

Me too, but they're rarely messages that need to be read NOW. I don't throw away the spam, just redirect it to a folder for later inspection. Maybe I'm not training the filter properly but I can't achieve a high enough accuracy to avoid the need to inspect the ones that creep through, which amounts to 20-30 a day. Better to have them in a separate folder and go in daily to pick out anything that's genuine What goes into my inbox is now always genuine except for the occasional one that spoofs my address.

Anyway, it'd be no good for a business but it works for me. And certainly for my wife, an Outlook Express user with about 20 contacts, who gets 1 genuine email a week hidden in 200 Brazilian spams (go figure; she's as English as fish'n'chips and doesn't speak a word of Portuguese). And that's just the ones that get through the Bayesian filter.

By the way, Dennis asked for a quick and easy solution.

-- GT

adam＠thebowery.co.uk

11:48 p.m.

New subject: [Alug] Spam filter + evolution

On Fri, Jun 25, 2004 at 11:14:31PM +0100, Graham wrote:

...

away the spam, just redirect it to a folder for later inspection. Maybe I'm not training the filter properly but I can't achieve a high enough accuracy to avoid the need to inspect the ones that creep through, which amounts to 20-30 a day. Better to have them in a separate folder and go in daily to

then something is quite wrong somewhere, or you get a huge amount of spam. I am saying that with minimal training this is quite easy recently I started from scratch and things were a bit topsy turvy for a week before I got around to retraining the database from scratch (feeding a few hundred megs of email into spamassassin holds things up a while) which is why I suggested he saves mail for a month.

...

By the way, Dennis asked for a quick and easy solution.

What do you mean by this? I dont' see how my solution isn't quick and easy, your solution would be unacceptable in my case to the point of being obtrusive.

Adam (PS if anyone uses those stupid challenge/response systems *you suck* I just approved a mail to such a system as I got the challenge/response "spam" due to a faked header)

-- jabberid = quinophex@jabber.earth.li AFFS || http://www.affs.org.uk/ || Not a filesystem

Wayne Stallwood

26 Jun 26 Jun

1:29 a.m.

New subject: [Alug] Spam filter + evolution

On Friday 25 June 2004 22:14, Graham wrote:

...

Maybe I'm not training the filter properly but I can't achieve a high enough accuracy to avoid the need to inspect the ones that creep through, which amounts to 20-30 a day.

I have found that you need a really generous quantity of mail in both the spam and ham folders and then training sa-learn by pointing it to both does the trick.

For some funny reason I have heard that it helps to have more ham than spam when doing this. Also it is good advice to use real spam you have received and not training samples, or someone else's trained filters.

Another thing I do in the war against spam is use different names within my mx domain to subscribe to each service. So when for example I registered to ebay I used ebay@mydomain.com the ALUG is aluglist@mydomain.com and so on.

This has two benefits, firstly when I do get spam I can tell where they harvested my address from, and secondly when the spam level gets too high for a particular address, I can then configure the servers at plusnet to dump that particular address.

Also if I have to put my email address up on a website as a mailto: contact, I tend use some javascript to mask it from harvesters.

I still pick up my older addresses and they are full of spam, my current address only picks up about 10 junk mails a week and sa usually catches all of them.

adam＠thebowery.co.uk

10:03 a.m.

New subject: [Alug] Spam filter + evolution

On Sat, Jun 26, 2004 at 01:35:30AM +0000, Wayne Stallwood wrote:

...

Another thing I do in the war against spam is use different names within my mx domain to subscribe to each service. So when for example I registered to ebay I used ebay@mydomain.com the ALUG is aluglist@mydomain.com and so on.

This has two benefits, firstly when I do get spam I can tell where they harvested my address from, and secondly when the spam level gets too high for

Quite a few spammers now use things like ebay@domain as part of dictionary attacks etc. etc. so the system isn't infallible, what you /really/ want to do is have amazon1999@ for ebay and ebay2002@ for amazon etc. just to obfuscate it a bit more :)

Adam

-- jabberid = quinophex@jabber.earth.li AFFS || http://www.affs.org.uk/ || Not a filesystem

Tim Green

2:03 p.m.

New subject: [Alug] Spam filter + evolution

On Sat, Jun 26, 2004 at 10:02:03AM +0100, adam@thebowery.co.uk wrote:

...

On Sat, Jun 26, 2004 at 01:35:30AM +0000, Wayne Stallwood wrote:

...
Another thing I do in the war against spam is use different names within my mx domain to subscribe to each service. So when for example I registered to ebay I used ebay@mydomain.com the ALUG is aluglist@mydomain.com and so on.

This has two benefits, firstly when I do get spam I can tell where they harvested my address from, and secondly when the spam level gets too high for

Quite a few spammers now use things like ebay@domain as part of dictionary attacks etc. etc. so the system isn't infallible, what you /really/ want to do is have amazon1999@ for ebay and ebay2002@ for amazon etc. just to obfuscate it a bit more :)

I was receiving a steady stream of spam to sales@ and info@ until I put a bounce on them. The bounce on my Usenet email address politely informs senders to try again with a different username for me.

I don't even trust you guys with my usual email address, and a good thing too since I found all our email addresses plastered all over the ALUG archive website.

Tim.

Dennis Dryden

27 Jun 27 Jun

4:04 a.m.

New subject: [Alug] Spam filter + evolution

Ive got it setup and working with evolution on its own. i had about 800 spam's and 300 hams that i used to teach it and out of 30 emails only 2 spam got through so im happy, it took ages to recive my email though. I think this was due to having it check on-line spam databases(dcc, razor, pyzor) as cpu usage was low most of the time while downloading the mail.

This is the main link i used to set it up if anyone want to know how: http://krath.dk/linux/evolution_spamfilter/

Thanks for your help everyone, Dennis

7685

Age (days ago)

7687

Last active (days ago)

main@lists.alug.org.uk

11 comments

5 participants

tags (0)

participants (5)

adam＠thebowery.co.uk
Dennis Dryden
Graham
Tim Green
Wayne Stallwood