I have a web page which has a couple of special characters which are ISO-8859-1 encoded, 'View Source' shows that the header is as follows:-
<head> <title>Chris Info</title> <link rel="stylesheet" href="http://home.isbd.net/css/infowiki.css" type="text/css"> <link rel="alternate" type="application/xml" title="RSS" href="/blog/rss2.xml"> <link rel="top" href="/" title="pyblosxom.sourceforge.net"> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head>
... but when started up Firefox always sets the character encoding to "Unicode (UTF-8)" by default and to see the characters correctly I have to manually switch it to "Western (ISO-8859-1)".
Is Firefox subtly trying to push me in the 'right' direction or, more realistically, what am I doing wrong? I have tried setting things to default to ISO-8859-1 in Firefox but I don't seem to be getting anywhere.
Chris G cl@isbd.net wrote:
I have a web page which has a couple of special characters which are ISO-8859-1 encoded, 'View Source' shows that the header is as follows:- [...]
Check the HTTP header (curl -I or wget --head) of the URL. The HTTP header and XML opening both trump the http-equiv meta, as described in http://www.w3.org/International/O-charset#declaring with links to how to correct things.
But also, you should be using UTF-8 for most things by now. ;-)
Hope that helps,
On Tue, Oct 09, 2007 at 03:05:20PM +0100, MJ Ray wrote:
Chris G cl@isbd.net wrote:
I have a web page which has a couple of special characters which are ISO-8859-1 encoded, 'View Source' shows that the header is as follows:- [...]
Check the HTTP header (curl -I or wget --head) of the URL. The HTTP header and XML opening both trump the http-equiv meta, as described in http://www.w3.org/International/O-charset#declaring with links to how to correct things.
OK, thanks, that *explains* it then:-
HTTP/1.1 200 OK Date: Tue, 09 Oct 2007 16:04:22 GMT Server: Apache/2.2.6 (Fedora) Content-Length: 2939 Connection: close Content-Type: text/html; charset=UTF-8
However, how do I get a file which has some ISO-8859-1 encodings in it to be transmitted correctly by the web server?
But also, you should be using UTF-8 for most things by now. ;-)
Yes, I know, that's why I asked if Firefox was dropping broad hints in my direction. :-) However since I'm using an editor which doesn't yet know about UTF-8 that's a little difficult for the moment, the next major version *will* know about UTF-8 but that doesn't help me at the moment. Note that these files are plain text files (well, reStructuredText files) and the HTML is generated on the fly, I need to be able to see the characters correctly displayed when viewed as text as well as when viewed via the browser.
Chris G cl@isbd.net wrote:
On Tue, Oct 09, 2007 at 03:05:20PM +0100, MJ Ray wrote:
Check the HTTP header (curl -I or wget --head) of the URL. The HTTP header and XML opening both trump the http-equiv meta, as described in http://www.w3.org/International/O-charset#declaring with links to how to correct things.
OK, thanks, that *explains* it then:-
[...]
However, how do I get a file which has some ISO-8859-1 encodings in it to be transmitted correctly by the web server?
Follow the links from that URL to how to correct things. In this case, you want "Setting the HTTP charset parameter" which includes:
"Apache. This can be done via the AddCharset (Apache 1.3.10 and later) or AddType directives, for directories or individual resources (files). With AddDefaultCharset (Apache 1.3.12 and later), it is possible to set the default 'charset' for a whole server. For more information, see the article on Setting 'charset' information in .htaccess."
Hope that helps,
On Tue, Oct 09, 2007 at 05:39:30PM +0100, MJ Ray wrote:
Chris G cl@isbd.net wrote:
On Tue, Oct 09, 2007 at 03:05:20PM +0100, MJ Ray wrote:
Check the HTTP header (curl -I or wget --head) of the URL. The HTTP header and XML opening both trump the http-equiv meta, as described in http://www.w3.org/International/O-charset#declaring with links to how to correct things.
OK, thanks, that *explains* it then:-
[...]
However, how do I get a file which has some ISO-8859-1 encodings in it to be transmitted correctly by the web server?
Follow the links from that URL to how to correct things. In this case, you want "Setting the HTTP charset parameter" which includes:
"Apache. This can be done via the AddCharset (Apache 1.3.10 and later) or AddType directives, for directories or individual resources (files). With AddDefaultCharset (Apache 1.3.12 and later), it is possible to set the default 'charset' for a whole server. For more information, see the article on Setting 'charset' information in .htaccess."
Excellent, thank you very much!
On Tue, Oct 09, 2007 at 06:40:47PM +0100, Chris G wrote:
On Tue, Oct 09, 2007 at 05:39:30PM +0100, MJ Ray wrote:
Chris G cl@isbd.net wrote:
On Tue, Oct 09, 2007 at 03:05:20PM +0100, MJ Ray wrote:
Check the HTTP header (curl -I or wget --head) of the URL. The HTTP header and XML opening both trump the http-equiv meta, as described in http://www.w3.org/International/O-charset#declaring with links to how to correct things.
OK, thanks, that *explains* it then:-
[...]
However, how do I get a file which has some ISO-8859-1 encodings in it to be transmitted correctly by the web server?
Follow the links from that URL to how to correct things. In this case, you want "Setting the HTTP charset parameter" which includes:
"Apache. This can be done via the AddCharset (Apache 1.3.10 and later) or AddType directives, for directories or individual resources (files). With AddDefaultCharset (Apache 1.3.12 and later), it is possible to set the default 'charset' for a whole server. For more information, see the article on Setting 'charset' information in .htaccess."
Excellent, thank you very much!
... and I now sort of hav it working but it has raised another question - how do you specify an apache directive for a dynamic file?
The example in question is the reStructuredText file:-
/home/chris/webdev/info/computer/internet/history/micromedia.rst
This gets displayed in the browser using the URL:-
http://home.isbd.net/cgi-bin/pyblosxom.cgi/wiki/computer/internet/history/mi...
I can get the GBP signs to display correctly if I remove the "AddDefaultCharset UTF-8" from my global httpd.conf file so that it reverts to the default of ISO-8859-1. However I'd really prefer to keep the default Charset as UTF-8 and change it to ISO-8859-1 for just the /home/chris/webdev/info hierarchy.
But how do I do it? Adding a .htaccess file with "AddCharset ISO-8859-1 .rst .txtl" to the directory /home/chris/webdev/info doesn't seem to do anything, but that doesn't surprise me really because I don't think my browser ever sees that .rst file. I also tried:-
<Directory /var/www/cgi-bin/pyblosxom.cgi> AddCharset ISO-8859-1 .rst .txtl AddDefaultCharset ISO-8859-1 </Directory>
at the bottom of the global httpd.conf file but that didn't work either. (... and yes, I did restart apache)
So, how do I just get that directory hierarchy to be ISO-8859-1? I think part of my problem is that the files are not display directly but munged by a CGI script.
On 09/10/2007, Chris G cl@isbd.net wrote:
So, how do I just get that directory hierarchy to be ISO-8859-1? I think part of my problem is that the files are not display directly but munged by a CGI script.
If it's a CGI script, just modify the CGI script to set the appropriate HTTP header.
Greg
On Tue, Oct 09, 2007 at 11:47:34PM +0100, Greg Thomas wrote:
On 09/10/2007, Chris G cl@isbd.net wrote:
So, how do I just get that directory hierarchy to be ISO-8859-1? I think part of my problem is that the files are not display directly but munged by a CGI script.
If it's a CGI script, just modify the CGI script to set the appropriate HTTP header.
... isn't this where we came in? :-)
The CGI (or HTML, or whatever) sets the header in the dynamically generated HTML to iso-8859-1 (I just checked and it does), but the apache server overrides that header with the line:-
AddDefaultCharset UTF-8
in the httpd.conf file. What I want to do is to change that UTF-8 back to is08859-1 for just the hierarchy of files used by my pyblosxom CGI script.
On 10/10/2007, Chris G cl@isbd.net wrote:
On Tue, Oct 09, 2007 at 11:47:34PM +0100, Greg Thomas wrote:
On 09/10/2007, Chris G cl@isbd.net wrote:
So, how do I just get that directory hierarchy to be ISO-8859-1? I think part of my problem is that the files are not display directly but munged by a CGI script.
If it's a CGI script, just modify the CGI script to set the appropriate HTTP header.
... isn't this where we came in? :-)
The CGI (or HTML, or whatever) sets the header in the dynamically generated HTML to iso-8859-1 (I just checked and it does), but the apache server overrides that header with the line:-
AddDefaultCharset UTF-8
in the httpd.conf file. What I want to do is to change that UTF-8 back to is08859-1 for just the hierarchy of files used by my pyblosxom CGI script.
What you showed originally was the HTML setting the charset, with the line
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
This is embedded in the HTML, and overwritten by any content-type presented in the HTTP headers, which is completely different. The default HTTP content-type is set by Apache - UTF-8. However, if you get your CGI script to send an HTTP header (not an HTML header) that sets the content type explicitly, Apache shouldn't have the need to apply a default.
What is the CGI script written in? It probably has a line something like
print "Content-Type: text/html\n";
change that to
print "Content-Type: text/html; charset=iso-8859-1\n";
(if you using something like CGI.pm, it may be something like "print cgi->header" - in which case you'll need to set up the header).
Greg
On Wed, Oct 10, 2007 at 09:11:37AM +0100, Greg Thomas wrote:
On 10/10/2007, Chris G cl@isbd.net wrote:
On Tue, Oct 09, 2007 at 11:47:34PM +0100, Greg Thomas wrote:
On 09/10/2007, Chris G cl@isbd.net wrote:
So, how do I just get that directory hierarchy to be ISO-8859-1? I think part of my problem is that the files are not display directly but munged by a CGI script.
If it's a CGI script, just modify the CGI script to set the appropriate HTTP header.
... isn't this where we came in? :-)
The CGI (or HTML, or whatever) sets the header in the dynamically generated HTML to iso-8859-1 (I just checked and it does), but the apache server overrides that header with the line:-
AddDefaultCharset UTF-8
in the httpd.conf file. What I want to do is to change that UTF-8 back to is08859-1 for just the hierarchy of files used by my pyblosxom CGI script.
What you showed originally was the HTML setting the charset, with the line
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
This is embedded in the HTML, and overwritten by any content-type presented in the HTTP headers, which is completely different. The default HTTP content-type is set by Apache - UTF-8. However, if you get your CGI script to send an HTTP header (not an HTML header) that sets the content type explicitly, Apache shouldn't have the need to apply a default.
What is the CGI script written in? It probably has a line something like
print "Content-Type: text/html\n";
change that to
print "Content-Type: text/html; charset=iso-8859-1\n";
There's a line in the config file:-
py["locale"] = "en_US.iso-8859-1"
... and there's also:-
py["blog_encoding"] = "iso-8859-1"
... but the output from the browser is still obstinately defaulted to UTF-8.
(if you using something like CGI.pm, it may be something like "print cgi->header" - in which case you'll need to set up the header).
It's all in python.
On 10/10/2007, Chris G cl@isbd.net wrote:
On Wed, Oct 10, 2007 at 09:11:37AM +0100, Greg Thomas wrote:
What you showed originally was the HTML setting the charset, with the line
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
This is embedded in the HTML, and overwritten by any content-type presented in the HTTP headers, which is completely different.
What is the CGI script written in? It probably has a line something like
print "Content-Type: text/html\n";
change that to
print "Content-Type: text/html; charset=iso-8859-1\n";
There's a line in the config file:-
py["locale"] = "en_US.iso-8859-1"
... and there's also:-
py["blog_encoding"] = "iso-8859-1"
... but the output from the browser is still obstinately defaulted to UTF-8.
Is there a way you can examine the headers that the CGI script is presenting without Apache getting in the way - with Perl you could just run the script from the command line, and it would send the output to stdout. Check and see if there is a content-type header specifying the right charset. I suspect not. Googling turns up http://www.python.org/doc/current/lib/cgi-intro.html which suggests that the header is output explicitly ('print "Content-Type: text/html"').