On 16-Feb-10 08:52:16, Chris G wrote:
On Tue, Feb 16, 2010 at 12:45:54AM -0000, Ted Harding wrote:
On 15-Feb-10 23:03:22, Chris G wrote:
On Mon, Feb 15, 2010 at 10:37:16PM -0000, Ted Harding wrote:
On 15-Feb-10 22:08:22, Chris G wrote:
I have an E-Mail (well lots actually) which has a text part with headers as follows:-
Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
My system is all utf-8 now so just outputting the E-Mail via 'more' or opening it in vi doesn't decode/show the accented characters correctly.
Mutt reads the above headers and converts the accented characters and shows them correctly.
I want to do some processing on this file and then display the text parts but at the moment I can't get anything to do what mutt appears to be able to do with no effort! E.g. I have tried:-
iconv -f iso-8859-1 <filename>
and it doesn't change anything at all. Nor does viewing the file in Firefox with the charset set to iso-8859-1 work.
What am I missing? It must be something blatantly obvious.
Quite possibly ...
According to 'man iconv' you should need *both* -f and -t:
SYNOPSIS iconv -f encoding -t encoding inputfile
Also, "iconv --list" lists the charset names in CAPITALS. . So maybe try
iconv -f ISO-8859-1 -t UTF-8 <filename>
It also lists ISO-8859-1, ISO8859-1 and ISO_8859-1 which are presumably equivalent [... ??].
Shooting not-quite-in-the-dark (illumination from 'man'), since I've never used this!
It *shouldn't* need the "-t UTF-8", my "man iconv" says:-
--to-code, -t encoding Convert characters to encoding. If not specified the encoding corresponding to the current locale
is used.
... and my locale is most definitely UTF-8.
But, if I do:-
iconv -f ISO-8859-1 -t UTF-8
/home/chris/Mail/boating/buyOurBoat/fredMolina/cur/1259766690.21706_5 5.c hris:2,RS | more
It still doesn't work.
-- Chris Green
Hmmm ... I just tried it. I have an email file (MH folder, so it's a stand-alone file) in iso-8859-1 charset, and it's in French so it has accents in the top half ( > 0x7F) of the encoding. Its name is "400".
I copied it over to a machine where the locale is
$ echo $LANG en_GB.UTF-8
When I do 'less 400' the accented characters show up as hex codes like (excerpt):
On m'a racont<E9> que, au pub pas loin de chez moi, ils ont mis sur le menu du jour un plat "Boeuf bourguignon"; et un couple fran<E7>ais a ral<E9>, disant que <E7>a devrait <EA>tre "Boeuf Bourguignonne" (selon ce qu'on m'a racont<E9>, qui risque <E9>videmment l'impr<E9>cision de la parole relay<E9>e).
(If I just do 'cat 400' then each of those hex codes shows as a "?" on a black background, and it also munges the subsequent character or two).
However, when I do
iconv -f ISO-8859-1 -t UTF-8 400 | less
I see:
On m'a raconté que, au pub pas loin de chez moi, ils ont mis sur le menu du jour un plat "Boeuf bourguignon"; et un couple français a ralé, disant que ça devrait être "Boeuf Bourguignonne" (selon ce qu'on m'a raconté, qui risque évidemment l'imprécision de la parole relayée).
(which you should see correctly, if your mail-reader acts properly for iso-8859-1 since that is the charset for this email). And it also works fine for "| cat" instead of "|less" (see comment above).
So, for me, it does work! And, incidentally, it does also work without the "-t UTF-8" option (though my own 'man iconv' makes no mention of what happens if you leave it out).
On the machine to which I copied it, and on which I ran iconv, I have Debian Etch (regularly upgraded).
That's really wierd then. I run xubuntu 9.10 which, as you know, uses most of the same packages that Debian uses so should have a pretty similar iconv. (... and yes, I do see all your accented characters in the iconv'ed file)
Do you have ISO-8859-1 locale files installed? I was wondering if iconv needs a locale installed in order to be able to convert files to/from that locale. If I run "locale -a" it shows that I only have UTF-8 locales installed (plus C and POSIX).
Still it seems odd that iconv doesn't complain at all.
-- Chris Green
Well, not being quite sure of everything implied by "have ISO-8859-1 locale files installed", I did the following:
locate 8859-1 | grep locale
with results:
/usr/share/X11/locale/iso8859-1 /usr/share/X11/locale/iso8859-1/Compose /usr/share/X11/locale/iso8859-1/XI18N_OBJS /usr/share/X11/locale/iso8859-1/XLC_LOCALE
(along with similar output for iso8859-10,11,13,14,15).
By the way: If I switch from X into a console terminal (Ctrl-Alt-F1) and do as above with that file, then "less 400" produces the same result as in the xterm before (with "<hexcode>"'s where the accented characters are), but "cat 400" produces output in which all the accented characters are simply missing (no "?" and the like as in the xterm).
Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding) Ted.Harding@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 16-Feb-10 Time: 11:02:27 ------------------------------ XFMail ------------------------------