Charset conversion confusion - mutt works, nothing else does - main

List overview All Threads
Download

newer

Charset conversion confusion - mutt works, nothing else does

older

SheevaPlug

User & Group in recent Linux...

Chris G

15 Feb 2010 15 Feb '10

10:08 p.m.

I have an E-Mail (well lots actually) which has a text part with headers as follows:-

Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

My system is all utf-8 now so just outputting the E-Mail via 'more' or opening it in vi doesn't decode/show the accented characters correctly.

Mutt reads the above headers and converts the accented characters and shows them correctly.

I want to do some processing on this file and then display the text parts but at the moment I can't get anything to do what mutt appears to be able to do with no effort! E.g. I have tried:-

iconv -f iso-8859-1 <filename>

and it doesn't change anything at all. Nor does viewing the file in Firefox with the charset set to iso-8859-1 work.

What am I missing? It must be something blatantly obvious.

-- Chris Green

Show replies by date

Chris G

15 Feb 15 Feb

10:21 p.m.

New subject: Charset conversion confusion - mutt works, nothing else does

On Mon, Feb 15, 2010 at 10:08:22PM +0000, Chris G wrote:

...

I have an E-Mail (well lots actually) which has a text part with headers as follows:-
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
My system is all utf-8 now so just outputting the E-Mail via 'more' or opening it in vi doesn't decode/show the accented characters correctly.

Mutt reads the above headers and converts the accented characters and shows them correctly.

I want to do some processing on this file and then display the text parts but at the moment I can't get anything to do what mutt appears to be able to do with no effort! E.g. I have tried:-
iconv -f iso-8859-1 <filename>
and it doesn't change anything at all. Nor does viewing the file in Firefox with the charset set to iso-8859-1 work.

What am I missing? It must be something blatantly obvious.

Wierdly if I use the iconv function in PHP I get exactly what I want I have:-

print $t $c = iconv('ISO_8859-1' ,'utf-8' , $t); print $c

and the first print shows the unconverted characters, the second shows correctly accented characters. Why on earth the command line iconv doesn't work the same I really don't understand. Still it doesn't matter much as I actually want to do it in PHP.

(Oh and I have tried a specific "-t utf-8" in the command line and it makes no difference)

-- Chris Green

Ted.Harding＠manchester.ac.uk

10:37 p.m.

New subject: Charset conversion confusion - mutt works, nothing el

On 15-Feb-10 22:08:22, Chris G wrote:

...

I have an E-Mail (well lots actually) which has a text part with headers as follows:-
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
My system is all utf-8 now so just outputting the E-Mail via 'more' or opening it in vi doesn't decode/show the accented characters correctly.

Mutt reads the above headers and converts the accented characters and shows them correctly.

I want to do some processing on this file and then display the text parts but at the moment I can't get anything to do what mutt appears to be able to do with no effort! E.g. I have tried:-
iconv -f iso-8859-1 <filename>
and it doesn't change anything at all. Nor does viewing the file in Firefox with the charset set to iso-8859-1 work.

What am I missing? It must be something blatantly obvious.

Quite possibly ...

According to 'man iconv' you should need *both* -f and -t:

SYNOPSIS iconv -f encoding -t encoding inputfile

Also, "iconv --list" lists the charset names in CAPITALS. . So maybe try

iconv -f ISO-8859-1 -t UTF-8 <filename>

It also lists ISO-8859-1, ISO8859-1 and ISO_8859-1 which are presumably equivalent [... ??].

Shooting not-quite-in-the-dark (illumination from 'man'), since I've never used this!

Hoping this helps, Ted.

-------------------------------------------------------------------- E-Mail: (Ted Harding) Ted.Harding@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 15-Feb-10 Time: 22:37:12 ------------------------------ XFMail ------------------------------

Chris G

11:03 p.m.

New subject: Charset conversion confusion - mutt works, nothing el

On Mon, Feb 15, 2010 at 10:37:16PM -0000, Ted Harding wrote:

...

On 15-Feb-10 22:08:22, Chris G wrote:

...
I have an E-Mail (well lots actually) which has a text part with headers as follows:-
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
My system is all utf-8 now so just outputting the E-Mail via 'more' or opening it in vi doesn't decode/show the accented characters correctly.

Mutt reads the above headers and converts the accented characters and shows them correctly.

I want to do some processing on this file and then display the text parts but at the moment I can't get anything to do what mutt appears to be able to do with no effort! E.g. I have tried:-
iconv -f iso-8859-1 <filename>
and it doesn't change anything at all. Nor does viewing the file in Firefox with the charset set to iso-8859-1 work.

What am I missing? It must be something blatantly obvious.
Quite possibly ...

According to 'man iconv' you should need *both* -f and -t:

SYNOPSIS iconv -f encoding -t encoding inputfile

Also, "iconv --list" lists the charset names in CAPITALS. . So maybe try

iconv -f ISO-8859-1 -t UTF-8 <filename>

It also lists ISO-8859-1, ISO8859-1 and ISO_8859-1 which are presumably equivalent [... ??].

Shooting not-quite-in-the-dark (illumination from 'man'), since I've never used this!

It *shouldn't* need the "-t UTF-8", my "man iconv" says:-

--to-code, -t encoding Convert characters to encoding. If not specified the encoding corresponding to the current locale is used.

... and my locale is most definitely UTF-8.

But, if I do:-

iconv -f ISO-8859-1 -t UTF-8 /home/chris/Mail/boating/buyOurBoat/fredMolina/cur/1259766690.21706_55.chris:2,RS | more

It still doesn't work.

-- Chris Green

Ted.Harding＠manchester.ac.uk

16 Feb 16 Feb

12:45 a.m.

New subject: Charset conversion confusion - mutt works, nothing el

On 15-Feb-10 23:03:22, Chris G wrote:

...

On Mon, Feb 15, 2010 at 10:37:16PM -0000, Ted Harding wrote:

...
On 15-Feb-10 22:08:22, Chris G wrote:

...
I have an E-Mail (well lots actually) which has a text part with headers as follows:-
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
My system is all utf-8 now so just outputting the E-Mail via 'more' or opening it in vi doesn't decode/show the accented characters correctly.

Mutt reads the above headers and converts the accented characters and shows them correctly.

I want to do some processing on this file and then display the text parts but at the moment I can't get anything to do what mutt appears to be able to do with no effort! E.g. I have tried:-
iconv -f iso-8859-1 <filename>
and it doesn't change anything at all. Nor does viewing the file in Firefox with the charset set to iso-8859-1 work.

What am I missing? It must be something blatantly obvious.
Quite possibly ...

According to 'man iconv' you should need *both* -f and -t:

SYNOPSIS iconv -f encoding -t encoding inputfile

Also, "iconv --list" lists the charset names in CAPITALS. . So maybe try

iconv -f ISO-8859-1 -t UTF-8 <filename>

It also lists ISO-8859-1, ISO8859-1 and ISO_8859-1 which are presumably equivalent [... ??].

Shooting not-quite-in-the-dark (illumination from 'man'), since I've never used this!
It *shouldn't* need the "-t UTF-8", my "man iconv" says:-
   --to-code, -t encoding
                 Convert characters to encoding. If not specified
                 the encoding corresponding to the current locale
is used.

... and my locale is most definitely UTF-8.

But, if I do:-
iconv -f ISO-8859-1 -t UTF-8
/home/chris/Mail/boating/buyOurBoat/fredMolina/cur/1259766690.21706_55.c hris:2,RS | more

It still doesn't work.

-- Chris Green

Hmmm ... I just tried it. I have an email file (MH folder, so it's a stand-alone file) in iso-8859-1 charset, and it's in French so it has accents in the top half ( > 0x7F) of the encoding. Its name is "400".

I copied it over to a machine where the locale is

$ echo $LANG en_GB.UTF-8

When I do 'less 400' the accented characters show up as hex codes like (excerpt):

On m'a racont<E9> que, au pub pas loin de chez moi, ils ont mis sur le menu du jour un plat "Boeuf bourguignon"; et un couple fran<E7>ais a ral<E9>, disant que <E7>a devrait <EA>tre "Boeuf Bourguignonne" (selon ce qu'on m'a racont<E9>, qui risque <E9>videmment l'impr<E9>cision de la parole relay<E9>e).

(If I just do 'cat 400' then each of those hex codes shows as a "?" on a black background, and it also munges the subsequent character or two).

However, when I do

iconv -f ISO-8859-1 -t UTF-8 400 | less

I see:

On m'a raconté que, au pub pas loin de chez moi, ils ont mis sur le menu du jour un plat "Boeuf bourguignon"; et un couple français a ralé, disant que ça devrait être "Boeuf Bourguignonne" (selon ce qu'on m'a raconté, qui risque évidemment l'imprécision de la parole relayée).

(which you should see correctly, if your mail-reader acts properly for iso-8859-1 since that is the charset for this email). And it also works fine for "| cat" instead of "|less" (see comment above).

So, for me, it does work! And, incidentally, it does also work without the "-t UTF-8" option (though my own 'man iconv' makes no mention of what happens if you leave it out).

On the machine to which I copied it, and on which I ran iconv, I have Debian Etch (regularly upgraded).

Ted.

-------------------------------------------------------------------- E-Mail: (Ted Harding) Ted.Harding@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 16-Feb-10 Time: 00:45:51 ------------------------------ XFMail ------------------------------

Chris G

8:52 a.m.

New subject: Charset conversion confusion - mutt works, nothing el

On Tue, Feb 16, 2010 at 12:45:54AM -0000, Ted Harding wrote:

...

On 15-Feb-10 23:03:22, Chris G wrote:

...
On Mon, Feb 15, 2010 at 10:37:16PM -0000, Ted Harding wrote:

...
On 15-Feb-10 22:08:22, Chris G wrote:

...
I have an E-Mail (well lots actually) which has a text part with headers as follows:-
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
My system is all utf-8 now so just outputting the E-Mail via 'more' or opening it in vi doesn't decode/show the accented characters correctly.

Mutt reads the above headers and converts the accented characters and shows them correctly.

I want to do some processing on this file and then display the text parts but at the moment I can't get anything to do what mutt appears to be able to do with no effort! E.g. I have tried:-
iconv -f iso-8859-1 <filename>
and it doesn't change anything at all. Nor does viewing the file in Firefox with the charset set to iso-8859-1 work.

What am I missing? It must be something blatantly obvious.
Quite possibly ...

According to 'man iconv' you should need *both* -f and -t:

SYNOPSIS iconv -f encoding -t encoding inputfile

Also, "iconv --list" lists the charset names in CAPITALS. . So maybe try

iconv -f ISO-8859-1 -t UTF-8 <filename>

It also lists ISO-8859-1, ISO8859-1 and ISO_8859-1 which are presumably equivalent [... ??].

Shooting not-quite-in-the-dark (illumination from 'man'), since I've never used this!
It *shouldn't* need the "-t UTF-8", my "man iconv" says:-
   --to-code, -t encoding
                 Convert characters to encoding. If not specified
                 the encoding corresponding to the current locale
is used.

... and my locale is most definitely UTF-8.

But, if I do:-
iconv -f ISO-8859-1 -t UTF-8
/home/chris/Mail/boating/buyOurBoat/fredMolina/cur/1259766690.21706_55.c hris:2,RS | more

It still doesn't work.

-- Chris Green
Hmmm ... I just tried it. I have an email file (MH folder, so it's a stand-alone file) in iso-8859-1 charset, and it's in French so it has accents in the top half ( > 0x7F) of the encoding. Its name is "400".

I copied it over to a machine where the locale is

$ echo $LANG en_GB.UTF-8

When I do 'less 400' the accented characters show up as hex codes like (excerpt):

On m'a racont<E9> que, au pub pas loin de chez moi, ils ont mis sur le menu du jour un plat "Boeuf bourguignon"; et un couple fran<E7>ais a ral<E9>, disant que <E7>a devrait <EA>tre "Boeuf Bourguignonne" (selon ce qu'on m'a racont<E9>, qui risque <E9>videmment l'impr<E9>cision de la parole relay<E9>e).

(If I just do 'cat 400' then each of those hex codes shows as a "?" on a black background, and it also munges the subsequent character or two).

However, when I do

iconv -f ISO-8859-1 -t UTF-8 400 | less

I see:

On m'a raconté que, au pub pas loin de chez moi, ils ont mis sur le menu du jour un plat "Boeuf bourguignon"; et un couple français a ralé, disant que ça devrait être "Boeuf Bourguignonne" (selon ce qu'on m'a raconté, qui risque évidemment l'imprécision de la parole relayée).

(which you should see correctly, if your mail-reader acts properly for iso-8859-1 since that is the charset for this email). And it also works fine for "| cat" instead of "|less" (see comment above).

So, for me, it does work! And, incidentally, it does also work without the "-t UTF-8" option (though my own 'man iconv' makes no mention of what happens if you leave it out).

On the machine to which I copied it, and on which I ran iconv, I have Debian Etch (regularly upgraded).

That's really wierd then. I run xubuntu 9.10 which, as you know, uses most of the same packages that Debian uses so should have a pretty similar iconv. (... and yes, I do see all your accented characters in the iconv'ed file)

Do you have ISO-8859-1 locale files installed? I was wondering if iconv needs a locale installed in order to be able to convert files to/from that locale. If I run "locale -a" it shows that I only have UTF-8 locales installed (plus C and POSIX).

Still it seems odd that iconv doesn't complain at all.

-- Chris Green

Ted.Harding＠manchester.ac.uk

11:02 a.m.

New subject: Charset conversion confusion - mutt works, nothing el

On 16-Feb-10 08:52:16, Chris G wrote:

...

On Tue, Feb 16, 2010 at 12:45:54AM -0000, Ted Harding wrote:

...
On 15-Feb-10 23:03:22, Chris G wrote:

...
On Mon, Feb 15, 2010 at 10:37:16PM -0000, Ted Harding wrote:

...
On 15-Feb-10 22:08:22, Chris G wrote:

...
I have an E-Mail (well lots actually) which has a text part with headers as follows:-
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
My system is all utf-8 now so just outputting the E-Mail via 'more' or opening it in vi doesn't decode/show the accented characters correctly.

Mutt reads the above headers and converts the accented characters and shows them correctly.

I want to do some processing on this file and then display the text parts but at the moment I can't get anything to do what mutt appears to be able to do with no effort! E.g. I have tried:-
iconv -f iso-8859-1 <filename>
and it doesn't change anything at all. Nor does viewing the file in Firefox with the charset set to iso-8859-1 work.

What am I missing? It must be something blatantly obvious.
Quite possibly ...

According to 'man iconv' you should need *both* -f and -t:

SYNOPSIS iconv -f encoding -t encoding inputfile

Also, "iconv --list" lists the charset names in CAPITALS. . So maybe try

iconv -f ISO-8859-1 -t UTF-8 <filename>

It also lists ISO-8859-1, ISO8859-1 and ISO_8859-1 which are presumably equivalent [... ??].

Shooting not-quite-in-the-dark (illumination from 'man'), since I've never used this!
It *shouldn't* need the "-t UTF-8", my "man iconv" says:-
   --to-code, -t encoding
                 Convert characters to encoding. If not
                 specified
                 the encoding corresponding to the current
                 locale
is used.

... and my locale is most definitely UTF-8.

But, if I do:-
iconv -f ISO-8859-1 -t UTF-8
/home/chris/Mail/boating/buyOurBoat/fredMolina/cur/1259766690.21706_5 5.c hris:2,RS | more

It still doesn't work.

-- Chris Green
Hmmm ... I just tried it. I have an email file (MH folder, so it's a stand-alone file) in iso-8859-1 charset, and it's in French so it has accents in the top half ( > 0x7F) of the encoding. Its name is "400".

I copied it over to a machine where the locale is

$ echo $LANG en_GB.UTF-8

When I do 'less 400' the accented characters show up as hex codes like (excerpt):

On m'a racont<E9> que, au pub pas loin de chez moi, ils ont mis sur le menu du jour un plat "Boeuf bourguignon"; et un couple fran<E7>ais a ral<E9>, disant que <E7>a devrait <EA>tre "Boeuf Bourguignonne" (selon ce qu'on m'a racont<E9>, qui risque <E9>videmment l'impr<E9>cision de la parole relay<E9>e).

(If I just do 'cat 400' then each of those hex codes shows as a "?" on a black background, and it also munges the subsequent character or two).

However, when I do

iconv -f ISO-8859-1 -t UTF-8 400 | less

I see:

On m'a raconté que, au pub pas loin de chez moi, ils ont mis sur le menu du jour un plat "Boeuf bourguignon"; et un couple français a ralé, disant que ça devrait être "Boeuf Bourguignonne" (selon ce qu'on m'a raconté, qui risque évidemment l'imprécision de la parole relayée).

(which you should see correctly, if your mail-reader acts properly for iso-8859-1 since that is the charset for this email). And it also works fine for "| cat" instead of "|less" (see comment above).

So, for me, it does work! And, incidentally, it does also work without the "-t UTF-8" option (though my own 'man iconv' makes no mention of what happens if you leave it out).

On the machine to which I copied it, and on which I ran iconv, I have Debian Etch (regularly upgraded).
That's really wierd then. I run xubuntu 9.10 which, as you know, uses most of the same packages that Debian uses so should have a pretty similar iconv. (... and yes, I do see all your accented characters in the iconv'ed file)

Do you have ISO-8859-1 locale files installed? I was wondering if iconv needs a locale installed in order to be able to convert files to/from that locale. If I run "locale -a" it shows that I only have UTF-8 locales installed (plus C and POSIX).

Still it seems odd that iconv doesn't complain at all.

-- Chris Green

Well, not being quite sure of everything implied by "have ISO-8859-1 locale files installed", I did the following:

locate 8859-1 | grep locale

with results:

/usr/share/X11/locale/iso8859-1 /usr/share/X11/locale/iso8859-1/Compose /usr/share/X11/locale/iso8859-1/XI18N_OBJS /usr/share/X11/locale/iso8859-1/XLC_LOCALE

(along with similar output for iso8859-10,11,13,14,15).

By the way: If I switch from X into a console terminal (Ctrl-Alt-F1) and do as above with that file, then "less 400" produces the same result as in the xterm before (with "<hexcode>"'s where the accented characters are), but "cat 400" produces output in which all the accented characters are simply missing (no "?" and the like as in the xterm).

Ted.

-------------------------------------------------------------------- E-Mail: (Ted Harding) Ted.Harding@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 16-Feb-10 Time: 11:02:27 ------------------------------ XFMail ------------------------------

Chris G

11:37 a.m.

New subject: Charset conversion confusion - mutt works, nothing el

On Tue, Feb 16, 2010 at 11:02:30AM -0000, Ted Harding wrote:

...

...
That's really wierd then. I run xubuntu 9.10 which, as you know, uses most of the same packages that Debian uses so should have a pretty similar iconv. (... and yes, I do see all your accented characters in the iconv'ed file)

Do you have ISO-8859-1 locale files installed? I was wondering if iconv needs a locale installed in order to be able to convert files to/from that locale. If I run "locale -a" it shows that I only have UTF-8 locales installed (plus C and POSIX).

Still it seems odd that iconv doesn't complain at all.

-- Chris Green

Well, not being quite sure of everything implied by "have ISO-8859-1 locale files installed", I did the following:

I did say 'If I run "locale -a"', the command "locale -a" will show you what locales are installed.

...

locate 8859-1 | grep locale

with results:

/usr/share/X11/locale/iso8859-1 /usr/share/X11/locale/iso8859-1/Compose /usr/share/X11/locale/iso8859-1/XI18N_OBJS /usr/share/X11/locale/iso8859-1/XLC_LOCALE

(along with similar output for iso8859-10,11,13,14,15).

Yes, I have those files too but "locale -a" doesn't show any iso8859 locales as being available. Ah, but "locale -m" *does* show that the iso8859-1 charmaps are available.

...

By the way: If I switch from X into a console terminal (Ctrl-Alt-F1) and do as above with that file, then "less 400" produces the same result as in the xterm before (with "<hexcode>"'s where the accented characters are), but "cat 400" produces output in which all the accented characters are simply missing (no "?" and the like as in the xterm).

I'm talking about terminal windows (xfce4-terminal to be exact) running in X. I've just tried a couple of other terminal types (a real xterm and a gnome-terminal), no change.

-- Chris Green