Hi Folks,
We are all getting more and more email with the primary body in HTML, as well as the corresponding attachment.
If your MUA allows message editing, it's easy to delete the HTML attachment. However, removing HTML tags from the body requires detailed editing. If I want to keep a copy of a message for future reference, I want to be able to edit it back to good old clean plain text. Manual editing can be very tedious for long messages.
Is there some program one can "pipe" the message into so as to strip out all the HTML cr^H^Hstuff?
I suppose a simple 'awk' script which struck out everything from "<" to the next ">" inclusive would at least achieve that, but it could also inadvertently achieve more than is wanted.
The thought has occurred to me to open the HTML in a browser, and mark&paste to the editor window, but that also is more trouble than I'm happy to go to (though better than nothing).
With thanks for ideas and suggestions, Best wishes to all, Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding) Ted.Harding@nessie.mcc.ac.uk Fax-to-email: +44 (0)870 167 1972 Date: 09-Sep-04 Time: 12:19:13 ------------------------------ XFMail ------------------------------
On Thu, 09 Sep 2004 12:28:56 +0100 (BST), Ted Harding ted.harding@nessie.mcc.ac.uk wrote:
Is there some program one can "pipe" the message into so as to strip out all the HTML cr^H^Hstuff?
I know lynx has an option to non-interactively get just the text from html.
There must be perl modules that can do it too.
Hope this helps, Tim.
On Thu, Sep 09, 2004 at 01:14:22PM +0100, Tim Green wrote:
On Thu, 09 Sep 2004 12:28:56 +0100 (BST), Ted Harding ted.harding@nessie.mcc.ac.uk wrote:
Is there some program one can "pipe" the message into so as to strip out all the HTML cr^H^Hstuff?
I know lynx has an option to non-interactively get just the text from html.
That's what I use with mutt, I simply never see the HTML at all, ever, and I reply in plain text.
I even have it set up to turn word documents into plain text as well.
On 2004-09-09 13:20:31 +0100 Chris Green chris@areti.co.uk wrote:
I even have it set up to turn word documents into plain text as well.
You may also like sxw2text from http://mjr.towers.org.uk/software.html#other if people have started sending you OO writer files yet.
On Thu, Sep 09, 2004 at 12:28:56PM +0100, Ted Harding wrote:
Hi Folks,
We are all getting more and more email with the primary body in HTML, as well as the corresponding attachment.
Who what now? Who's sending HTML messages with out a plain text counterpart? You should reply to them and fix them, preferably by attaching a virus to the e-mail, if they're using something dumb enough to not do it as multipart/alternative, and not have a plain text part, it's bound to just execute the virus...
Personally, I've got mutt set up to fire up w3m when I get a HTML only e-mail, I would not like to edit mail sent to me as that kinda defeats the purpose of me keeping it. Train the monkeys, don't work around the problems they cause :)
Just my 2p...
(Oh, and btw, I'm getting less and less HTML e-mail, because I tend to make it very clear that I will not read HTML e-mail as quickly as plain text e-mail, I want clarity, not messy fonts et al. Ergo, if people want me to read something, they send it in plain text, or, if it is neccessary to be in html, a URL to the information in a plain text e-mail with a summary of what I'm supposed to be reading that URL for. Right - erm - random rant over...)
I suppose in answer to your origional question though is something like html2text and some procmail/maildrop/exim foo to automagically mess with the message (keep the origional somewhere, though).
Cheers,
On Thu, Sep 09, 2004 at 12:28:56PM +0100, Ted Harding wrote:
I suppose a simple 'awk' script which struck out everything from "<" to the next ">" inclusive would at least achieve that, but it could also inadvertently achieve more than is wanted.
The thought has occurred to me to open the HTML in a browser, and mark&paste to the editor window, but that also is more trouble than I'm happy to go to (though better than nothing).
How about piping it into 'lynx --dump'. I think that's probably your easiest option
We are all getting more and more email with the primary body in HTML, as well as the corresponding attachment.
Is there some program one can "pipe" the message into so as to strip out all the HTML cr^H^Hstuff?
Howabout demime? I use it on a mailing list to strip out all html and attachments and it seems to work quite well. You can just pipe messages through it and it will strip out html, attachments et al.
See scifi.squawk.com/demime.html, although the site seems to be down at the minute. You can also search for demime on google.
Andy Beverley
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 9 Sep, 2004, at 13:27, Andrew Beverley wrote:
We are all getting more and more email with the primary body in HTML, as well as the corresponding attachment.
Is there some program one can "pipe" the message into so as to strip out all the HTML cr^H^Hstuff?
Howabout demime? I use it on a mailing list to strip out all html and attachments and it seems to work quite well. You can just pipe messages through it and it will strip out html, attachments et al.
See scifi.squawk.com/demime.html, although the site seems to be down at the minute. You can also search for demime on google.
Or more simply reject the message and tell the originator to remove the crap..
John
Easy, use Pine, it copes with HTML email, provided it has a <HTML> and </HTML> tag.. wonder how many email clients that confused :-)
It is annoying though to get HTML email when you're using a reader that doesn't cope.
That's why whenever I use Outlook, I've got it configured to send in plain text.
On Thu, 9 Sep 2004 15:23:30 +0100 (BST), Chris Glover chris@glovercc.clara.co.uk was rumoured to have said:
Easy, use Pine, it copes with HTML email, provided it has a <HTML> and
</HTML> tag.. wonder how many email clients that confused :-)
It is annoying though to get HTML email when you're using a reader that doesn't cope.
I'd just like to take the opportunity to plug my favourite uber-powerful yet little-known MUA ;)
Wanderlust handles this well: it can be set to display the text/plain alternative if present, or render the html with w3m (optionally without fetching images to avoid `web bugs' in spam).
I also plays well with other Emacs packages (mailcrypt, supercite, bbdb) and is extremely customisable and well documented like most elisp packages. Only caveat is that said customisation can easily take a couple of weekends ;)
Their web site is http://www.gohome.org/wl/ ; cvs version has generic code to interface with spam filters. Debian users will want to apt-get install wl-beta.
[snip]
rgds, /-sb.
On Thu, 09 Sep 2004 19:34:19 +0100, Stelios Bounanos sb@dial.pipex.com wrote:
render the html with w3m (optionally without fetching images to avoid `web bugs' in spam).
Eh? I thought w3m was a text only browser, and the graphical version was a flight of fantasy that was never maintained.
When I'm not using Gmail (which also doesn't fetch web images until an extra button is pressed) I use mutt which uses w3m in just text mode.
Tim.
On Thu, 9 Sep 2004 21:19:49 +0100, Tim Green timothy.j.green@gmail.com was rumoured to have said:
On Thu, 09 Sep 2004 19:34:19 +0100, Stelios Bounanos sb@dial.pipex.com wrote:
render the html with w3m (optionally without fetching images to avoid `web bugs' in spam).
Eh? I thought w3m was a text only browser, and the graphical version was a flight of fantasy that was never maintained.
Hmm, I distinctly remember having got w3m to display images in a framebuffer console and an xterm. But that was a long time ago and it would not work when I tried it again on a sparc running linux some months later.
FWIW, Debian still has a package that claims it can do it: w3m-img - inline image extension support utilities for w3m
Anyway, this must be easier to do in an Emacs buffer because the emacs lisp interface (w3m-el) has worked for me since Emacs 21 came out, and also supports gifs now that the LZW patent has expired. Here's a screenshot: http://privatewww.essex.ac.uk/~sbouna/emacs-w3m.png
When I'm not using Gmail (which also doesn't fetch web images until an extra button is pressed) I use mutt which uses w3m in just text mode.
I don't know how w3m-el does the trick, but I suspect that it doesn't depend on w3m to do anything with the images. If mutt works in the usual way (i.e. runs w3m on a file using popen(3) and reads its output from the pipe) then it won't show images even if w3m could...
Tim.
rgds, /-sb.
On 2004-09-09 23:35:27 +0100 Stelios Bounanos sb@dial.pipex.com wrote:
FWIW, Debian still has a package that claims it can do it: w3m-img - inline image extension support utilities for w3m
The copyright file for that leads to a homepage at http://w3m.sf.net/ and it looks maintained.
Chris Glover chris@glovercc.clara.co.uk writes:
Easy, use Pine, it copes with HTML email, rovided it has a <HTML> and
</HTML> tag.. wonder how many email clients that confused :-)
Pine should look at the Content-Type header field (and probably does). The <html> and </html> tags are in any case optional.