On Wed, Nov 03, 2004 at 03:10:22PM +0000, Matt Parker wrote:
On 3/11/2004, "MJ Ray" mjr@dsl.pipex.com wrote:
This is a typical slippery slope... I start off from a general issue that affects most users, through a specific application, to a specific programming question. Are you holding tight? Is there a C doctor in the house?
Some of you may have "enjoyed" the change of character set from ISO-8859-1 (an 8-bit character code, so 256 possible characters)) to utf-8 (a large character set which is converted into 8-bit codes, pairs of 8-bit codes and so on). Basically, 8859-1 only lets you display western European text, while utf-8 lets you have southern or eastern European languages, or greek or cyrillic or whatever, all at once without doing anything unusual with character sets. FAQ at http://www.cl.cam.ac.uk/~mgk25/unicode.html
The SQLite database seems to use UTF-16 as a basic datatype. I was having a browse after it was suggested that I try writing a Scheme interface to it. When reading http://www.sqlite.org/capi3.html, the following caught my eye: "There is no agreement on what the C datatype for a UTF-16 string should be."
Is there really such disagreement on this basic datatype? The FAQ makes it look clearcut on wchar_t. What types do C programmers really use?
For my part, Scheme's character and string datatypes seem to cope with unicode in theory, but the implementation details (such as what character set) are still being thrashed out.
One word - Java...
Two words: SUCKS ARSE
It's language syntax is bitty, and if you want a portable, interpreted language (which is basically all Java is, afterall) use python, perl, scheme, *ANYTHING* but Java... if you'd read Mark's mail you may have noticed that he mentioned that scheme appears to deal with it perfectly fine, from what I could tell, he was meerly asking the question of what the current bazillions of C coders use to represent utf-8.
Unicode capability and built in character conversions for everyhing from ISO-8856-1, through UTF-8 and UTF-16 to things like BIG-5 for Chinese.
python has all of these too, is a nicer language, and works for more people. until there is a fully open source, working, JRE, Java is still right at the back of my list of languages.
It's one of the things that was designed in from the beginning. There's a reason that "use the right tool for the job" became such a popular saying...
Yes. And the right tool is *very* *rarely* Java. It's just Java developers appear not to understand that Java is not the be all and end all of the programming world.
Back to the issue: Mark, I honestly don't know, but I wouldn't be at all suprised if it was treated as different things by different people, possibly even defining a new type for it. What do gnu.org suggest?
Thanks,