When I view text files in Firefox any pound signs (as in UK currency, not #) are shown as a ? and quite a few subsequent characters are lost.
Pound signs in HTML are fine.
Does anyone have any idea why this might be? Is it a browser issue or a web server issue (this is text files being served by apache on my home Linux box)?
On Fri, Mar 23, 2007 at 03:21:21PM +0000, Eur Ing Chris Green wrote:
When I view text files in Firefox any pound signs (as in UK currency, not #) are shown as a ? and quite a few subsequent characters are lost.
Pound signs in HTML are fine.
Does anyone have any idea why this might be? Is it a browser issue or a web server issue (this is text files being served by apache on my home Linux box)?
This is really wierd!
It depends how many pound signs there are, what follows them, etc.
I have set some test files up on my server, I'd be interested to know if others see the same effects (I think they will as I believe they're caused by the apache server, not the browser).
http://home.isbd.net/chris/xxx.txt Renders as I would expect, even though it has a string of pound signs. Page info says it's windows-1253 encoding.
http://home.isbd.net/chris/testPounds1.txt Renders OK (only one pound sign) but in a slightly larger size font than the one above. Says it's windows-1252 encoding.
http://home.isbd.net/chris/testPounds2.txt Shows ? for the pound sign and loses several characters after the pound sign, uses an even bigger font! Page info says it's gb18030 encoding.
I suppose apache is trying to guess the encoding by some sort of heuristic and getting things somewhat wrong. Does anyone have any idea where/how apache does this?
On 23-Mar-07 16:04:44, Eur Ing Chris Green wrote:
On Fri, Mar 23, 2007 at 03:21:21PM +0000, Eur Ing Chris Green wrote:
When I view text files in Firefox any pound signs (as in UK currency, not #) are shown as a ? and quite a few subsequent characters are lost.
Pound signs in HTML are fine.
Does anyone have any idea why this might be? Is it a browser issue or a web server issue (this is text files being served by apache on my home Linux box)?
This is really wierd!
It depends how many pound signs there are, what follows them, etc.
I have set some test files up on my server, I'd be interested to know if others see the same effects (I think they will as I believe they're caused by the apache server, not the browser).
http://home.isbd.net/chris/xxx.txt Renders as I would expect, even though it has a string of pound signs. Page info says it's windows-1253 encoding. http://home.isbd.net/chris/testPounds1.txt Renders OK (only one pound sign) but in a slightly larger size font than the one above. Says it's windows-1252 encoding. http://home.isbd.net/chris/testPounds2.txt Shows ? for the pound sign and loses several characters after the pound sign, uses an even bigger font! Page info says it's gb18030 encoding.
I suppose apache is trying to guess the encoding by some sort of heuristic and getting things somewhat wrong. Does anyone have any idea where/how apache does this?
I see all three pages apparently perfectly (see below as check against what they mght be supposed to be). My firefox doesn't seem to have a "Page info", but the default encoding is "Western ISO-8859-1".
In all three cases,
"View" --> "Character Encoding" says "Western ISO-8859-1".
"View" --> "Page Source" simply shows the text, with no HTML to mess with the encoding.
What I see on each of the pages:
xxx.tst: ======== This is xxx.txt.
Here are some ££££££££££££££ signs.
Here are some accented characters éèçö
Here is a link:-
testPounds1.txt: ================ Here is a pound sign with a number after £2 ... and here is hash sign with a number after #9
testPounds2.txt: ================ Here is a pound sign with a number after £2
... and here is another pound sign with a number after £9
Was that what they're supposed to look like?
Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding) ted.harding@nessie.mcc.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 23-Mar-07 Time: 18:11:39 ------------------------------ XFMail ------------------------------
On Fri, Mar 23, 2007 at 06:11:42PM -0000, Ted Harding wrote:
On 23-Mar-07 16:04:44, Eur Ing Chris Green wrote:
It depends how many pound signs there are, what follows them, etc.
I have set some test files up on my server, I'd be interested to know if others see the same effects (I think they will as I believe they're caused by the apache server, not the browser).
http://home.isbd.net/chris/xxx.txt Renders as I would expect, even though it has a string of pound signs. Page info says it's windows-1253 encoding. http://home.isbd.net/chris/testPounds1.txt Renders OK (only one pound sign) but in a slightly larger size font than the one above. Says it's windows-1252 encoding. http://home.isbd.net/chris/testPounds2.txt Shows ? for the pound sign and loses several characters after the pound sign, uses an even bigger font! Page info says it's gb18030 encoding.
I suppose apache is trying to guess the encoding by some sort of heuristic and getting things somewhat wrong. Does anyone have any idea where/how apache does this?
I see all three pages apparently perfectly (see below as check against what they mght be supposed to be). My firefox doesn't seem to have a "Page info", but the default encoding is "Western ISO-8859-1".
In all three cases,
"View" --> "Character Encoding" says "Western ISO-8859-1".
"View" --> "Page Source" simply shows the text, with no HTML to mess with the encoding.
What I see on each of the pages:
xxx.tst:
This is xxx.txt.
Here are some ££££££££££££££ signs.
Here are some accented characters éèçö
Here is a link:-
http://www.google.co.uk/
testPounds1.txt:
Here is a pound sign with a number after £2 ... and here is hash sign with a number after #9
testPounds2.txt:
Here is a pound sign with a number after £2
... and here is another pound sign with a number after £9
Was that what they're supposed to look like?
Thanks for looking! Yes, you're seeing them as they should be.
Now I'm at home I'm seeing them correctly too so my original surmise that it was an apache server issue is wrong, the problem is with my Firefox at work. Well that puts me somewhere on the way to finding the problem.
I suspect it may be something to do with the FC6 installation being UTF-8 by default. I have set it to ISO-8859-1 in a few places but I think there must be some more places to fix.
Thanks again for the help.
On Fri, Mar 23, 2007 at 06:25:35PM +0000, Eur Ing Chris Green wrote:
I suspect it may be something to do with the FC6 installation being UTF-8 by default. I have set it to ISO-8859-1 in a few places but I think there must be some more places to fix.
This is arse about tit; UTF-8 is decidedly the way forward and you'd be much better spending your time converting your usage of iso-8859-1 to UTF-8 than vice versa.
J.
On Fri, Mar 23, 2007 at 06:33:57PM +0000, Jonathan McDowell wrote:
On Fri, Mar 23, 2007 at 06:25:35PM +0000, Eur Ing Chris Green wrote:
I suspect it may be something to do with the FC6 installation being UTF-8 by default. I have set it to ISO-8859-1 in a few places but I think there must be some more places to fix.
This is arse about tit; UTF-8 is decidedly the way forward and you'd be much better spending your time converting your usage of iso-8859-1 to UTF-8 than vice versa.
Yes, I sort of realise this but, given my work environment, trying to use UTF-8 on my Linux box in a sea of ISO-8859-1 (and worse) systems isn't viable at the moment.
Reading a UTF-8 FAQ I find things like the following:-
In UTF-8 mode, terminal emulators such as xterm or the Linux console driver transform every keystroke into the corresponding UTF-8 sequence and send it to the stdin of the foreground process. Similarly, any output of a process on stdout is sent to the terminal emulator, where it is processed with a UTF-8 decoder and then displayed using a 16-bit font.
With a mix of PCs, legacy Unix systems, Solaris 2.6, Solaris 2.8 and some Linux boxes on the same network displaying (via X and/or ssh) on each others screens it's going to be a long time before we are able to work with multibyte characters. As it is, using ISO-8859-1 is fairly reliable throughout.