MJ Ray wrote:
ObTopic: I'd be surprised if ocr'ing spam images worked well and what would happen when it saw one of the worms with an image that tries to exploit an error in a common graphics library?
I'm not convinced it would be so bad (and, from looking at the spam I've had recently the image is a single image, not a composite of multiple images, although I appreciate that this would be the spammer's next step).
Assume the OCR makes mistakes reading the text and comes out with some (to a human reading it) garbage. So what? The next time it sees the same image it'll see the same garbage and if it learnt from the first one it'll recognise the second one. This is no different from the previous spammer trick of mixing numbers and letters and misspellings. I'm sure I'm not alone in having seen one or two image spams repeated ad-nausiem recently.
As to whether a worm can exploit an error in a graphics library - well what if it exploits an error in spamassassin, or exim/qmail, or something else?
There are also some non-spam benefits: I could search for text within images in archived emails. It wouldn't help me much but a lot of our clients are insurance brokers who seem to email lots of scanned documents around as .tiff or .jpeg (or more usually as a bitmap embedded in a .doc, but that's another issue entirely...)