Re: [ALUG] SpamAssasin and gocr/jocr

31 Jul 2006


      Mark Rogers wrote:
...
As I mentioned elsewhere, OCR accuracy isn't as important as OCR
repeatability.
FWIW I just ran some quick tests using gocr and Imagemagick:
convert image001.gif pnm:- | gocr -
.. where image001.gif is a sample spam advert.
When comparing the results from a wide sample[*] of images, despite the
ocr results being relatively poor in themselves, there was very little
variation between successive tests. So if I teach spamassassin that the
first is spam it should work out that the next is spam too.
[*] OK it was two images, its all I had to hand. But they were
different; one had a completely different background colour, for a start.
...
CPU overhead is by far the biggest issue, but this is probably OK at the
client end (not sure it would scale well to an ISP implementation).
On the low-spec Win2K test PC I had to hand (so I'm pretty sure this is
worst case) I could process between 2 and 3 images per second.
-- 
Mark Rogers
More Solutions Ltd :: 0845 45 89 555

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: [ALUG] SpamAssasin and gocr/jocr