[ALUG] SpamAssasin and gocr/jocr

26 Jul 2006


      Has anyone here experimented with defeating imgae-based spam using
something like gocr (jocr.sf.net)?
I've seen one attempt (http://wiki.apache.org/spamassassin/OcrPlugin)
but I've not tried it (I'm not convinced by the method it uses, since it
uses a fixed word list).
My thought was that it ought to be simple to automatically OCR any image
attachment and add some headers to the email containing the image text,
then have SA check those headers for content (does SA check headers?)
That way any words appearing in the image that have previously been seen
in spam as plain text will get caught.
It also occurred to me that OCR accuracy might not be too important.
After all, if the same image spam is seem multiple times and marked as
spam, then the OCR misinterpretations of the words will be common to
each scan even if the words found are wrong.
However, I have no real idea how to go about implementing this.
-- 
Mark Rogers
More Solutions Ltd :: 0845 45 89 555

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

[ALUG] SpamAssasin and gocr/jocr