Oh, remember that xpdf (or even pdf2ps and ps2ascii in some cases) recovers you the original text for editing.
In summary: if such tools work for a particular file, fine. But, in my direct experience, there are so many catastrophic exceptions to this that I don't consider them as serious options to be relied on.
A PS to the above. If you need to extract the text from a PDF/PS file for editing or re-use, a very good way [if PDF, first save to PS using acroread -toPostSript] is to convert the pages of the PS file to a bitmap (FAX tiff format is excellent for this purpose, and ImageMagick's 'convert' is good for the conversion), and run it through an OCR program (OCRshop is good, but even the OCR you get on the floppy with your scanner usually deals well with FAX tiff). [And, if you have a fax-to-email as I do, you can get your tiff file by running printed sheets through your fax machine.]
In my experience, you get over 99% of characters correctly recognised, in the right order, and spaced as they should be. And no nonsense from programs which get confused by the way layout is incorporated in the PS/PDF file.
Cheers again, Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding) Ted.Harding@nessie.mcc.ac.uk Fax-to-email: +44 (0)870 167 1972 Date: 07-Feb-02 Time: 17:59:46 ------------------------------ XFMail ------------------------------