This is a typical slippery slope... I start off from a general issue that affects most users, through a specific application, to a specific programming question. Are you holding tight? Is there a C doctor in the house?
Some of you may have "enjoyed" the change of character set from ISO-8859-1 (an 8-bit character code, so 256 possible characters)) to utf-8 (a large character set which is converted into 8-bit codes, pairs of 8-bit codes and so on). Basically, 8859-1 only lets you display western European text, while utf-8 lets you have southern or eastern European languages, or greek or cyrillic or whatever, all at once without doing anything unusual with character sets. FAQ at http://www.cl.cam.ac.uk/~mgk25/unicode.html
The SQLite database seems to use UTF-16 as a basic datatype. I was having a browse after it was suggested that I try writing a Scheme interface to it. When reading http://www.sqlite.org/capi3.html, the following caught my eye: "There is no agreement on what the C datatype for a UTF-16 string should be."
Is there really such disagreement on this basic datatype? The FAQ makes it look clearcut on wchar_t. What types do C programmers really use?
For my part, Scheme's character and string datatypes seem to cope with unicode in theory, but the implementation details (such as what character set) are still being thrashed out.
On Wed, 03 Nov 2004 14:58:51 +0000, MJ Ray mjr@dsl.pipex.com wrote:
The SQLite database seems to use UTF-16 as a basic datatype. I was having a browse after it was suggested that I try writing a Scheme interface to it. When reading http://www.sqlite.org/capi3.html, the following caught my eye: "There is no agreement on what the C datatype for a UTF-16 string should be."
Is there really such disagreement on this basic datatype? The FAQ makes it look clearcut on wchar_t. What types do C programmers really use?
Visual Studio 6 favours wchar_t with a comment to the effect of "use unsigned short on Macintosh compilers".
I'll go wash my mouth out now ;-)
Tim.
On 3/11/2004, "MJ Ray" mjr@dsl.pipex.com wrote:
This is a typical slippery slope... I start off from a general issue that affects most users, through a specific application, to a specific programming question. Are you holding tight? Is there a C doctor in the house?
Some of you may have "enjoyed" the change of character set from ISO-8859-1 (an 8-bit character code, so 256 possible characters)) to utf-8 (a large character set which is converted into 8-bit codes, pairs of 8-bit codes and so on). Basically, 8859-1 only lets you display western European text, while utf-8 lets you have southern or eastern European languages, or greek or cyrillic or whatever, all at once without doing anything unusual with character sets. FAQ at http://www.cl.cam.ac.uk/~mgk25/unicode.html
The SQLite database seems to use UTF-16 as a basic datatype. I was having a browse after it was suggested that I try writing a Scheme interface to it. When reading http://www.sqlite.org/capi3.html, the following caught my eye: "There is no agreement on what the C datatype for a UTF-16 string should be."
Is there really such disagreement on this basic datatype? The FAQ makes it look clearcut on wchar_t. What types do C programmers really use?
For my part, Scheme's character and string datatypes seem to cope with unicode in theory, but the implementation details (such as what character set) are still being thrashed out.
One word - Java...
Unicode capability and built in character conversions for everyhing from ISO-8856-1, through UTF-8 and UTF-16 to things like BIG-5 for Chinese.
It's one of the things that was designed in from the beginning. There's a reason that "use the right tool for the job" became such a popular saying...
Matt
On Wed, Nov 03, 2004 at 03:10:22PM +0000, Matt Parker wrote:
On 3/11/2004, "MJ Ray" mjr@dsl.pipex.com wrote:
This is a typical slippery slope... I start off from a general issue that affects most users, through a specific application, to a specific programming question. Are you holding tight? Is there a C doctor in the house?
Some of you may have "enjoyed" the change of character set from ISO-8859-1 (an 8-bit character code, so 256 possible characters)) to utf-8 (a large character set which is converted into 8-bit codes, pairs of 8-bit codes and so on). Basically, 8859-1 only lets you display western European text, while utf-8 lets you have southern or eastern European languages, or greek or cyrillic or whatever, all at once without doing anything unusual with character sets. FAQ at http://www.cl.cam.ac.uk/~mgk25/unicode.html
The SQLite database seems to use UTF-16 as a basic datatype. I was having a browse after it was suggested that I try writing a Scheme interface to it. When reading http://www.sqlite.org/capi3.html, the following caught my eye: "There is no agreement on what the C datatype for a UTF-16 string should be."
Is there really such disagreement on this basic datatype? The FAQ makes it look clearcut on wchar_t. What types do C programmers really use?
For my part, Scheme's character and string datatypes seem to cope with unicode in theory, but the implementation details (such as what character set) are still being thrashed out.
One word - Java...
Two words: SUCKS ARSE
It's language syntax is bitty, and if you want a portable, interpreted language (which is basically all Java is, afterall) use python, perl, scheme, *ANYTHING* but Java... if you'd read Mark's mail you may have noticed that he mentioned that scheme appears to deal with it perfectly fine, from what I could tell, he was meerly asking the question of what the current bazillions of C coders use to represent utf-8.
Unicode capability and built in character conversions for everyhing from ISO-8856-1, through UTF-8 and UTF-16 to things like BIG-5 for Chinese.
python has all of these too, is a nicer language, and works for more people. until there is a fully open source, working, JRE, Java is still right at the back of my list of languages.
It's one of the things that was designed in from the beginning. There's a reason that "use the right tool for the job" became such a popular saying...
Yes. And the right tool is *very* *rarely* Java. It's just Java developers appear not to understand that Java is not the be all and end all of the programming world.
Back to the issue: Mark, I honestly don't know, but I wouldn't be at all suprised if it was treated as different things by different people, possibly even defining a new type for it. What do gnu.org suggest?
Thanks,
Hey Brett
In all fairness, Why would anyone in their right mind would want to waste 130M of disk space ?
Regards, Paul
(Never one to miss a fight :-/ )
On Wednesday 03 November 2004 15:25, Brett Parker wrote:
One word - Java...
Two words: SUCKS ARSE
On Wed, Nov 03, 2004 at 03:24:41PM +0000, Paul wrote:
Hey Brett
In all fairness, Why would anyone in their right mind would want to waste 130M of disk space ?
Or the resources needed to run it... (by the way... top posting is bad, loses context, very confusing... Fair enough, in that case it was followable, but people scanning the list archives might not follow it as easily :)
I'm not even sure that I can use the sun jdk on here, running pure64 with the minimum amount of 32 libraries really does tend to show what is and isn't 64 bit ready :) Apparently OpenOffice.org isn't... Ubuntu clunks it with some 32 bit libraries, but as I'm not running ubuntu, that doesn't really help :)
Blip!
It's language syntax is bitty, and if you want a portable, interpreted language (which is basically all Java is, afterall)
As soon as you mention the word "interpreted" you are showing your ignorance. Java is NOT an interpreted language.
And saying you'd use ANYTHING other than Java in, presumably, all cases demonstrates your inability to choose the right tool objectively.
I have used many languages in my time, including Java, and some I liked and some I hated. For the record Perl is something I never got on with, for example, but thats another flame-war. But the point is I used them all when the situation required the best language for the problem.
In the case of Java, it wins hands down on anything to do with the Internet that is more involved than a few scripted pages, and it wins hands down in terms of character encodings and language conversion. It has many drawbacks, including some of its API, though version 5.0 (or 1.5 whatever you want to call it) has improved the language significantly IMO.
Matt
On Wed, Nov 03, 2004 at 03:33:53PM +0000, Matt Parker wrote:
In the case of Java, it wins hands down on anything to do with the Internet that is more involved than a few scripted pages, and it wins hands down in terms of character encodings and language conversion. It has many drawbacks, including some of its API, though version 5.0 (or 1.5 whatever you want to call it) has improved the language significantly IMO.
I like the language, but the lack of a decent implementation holds it back. A fully working Free implementation would be a good start.
J.
On 3/11/2004, "Jonathan McDowell" noodles@earth.li wrote:
On Wed, Nov 03, 2004 at 03:33:53PM +0000, Matt Parker wrote:
In the case of Java, it wins hands down on anything to do with the Internet that is more involved than a few scripted pages, and it wins hands down in terms of character encodings and language conversion. It has many drawbacks, including some of its API, though version 5.0 (or 1.5 whatever you want to call it) has improved the language significantly IMO.
I like the language, but the lack of a decent implementation holds it back. A fully working Free implementation would be a good start.
J.
Free never really bothered me. I don't use Linux for ideological reasons. I use it because it's better than anything alse out there.
BTW, the source is available for both the IBM and Sun JVMs and accompanying APIs, just that the source isn't published under a Free license. In the case of Sun it's licensed under the "Sun Community Source License" - http://wwws.sun.com/software/communitysource/j2se/java2/index.html
JRockit from BEA is also a pretty good JVM.
Matt
On 2004-11-03 15:48:50 +0000 Matt Parker matt@mpcontracting.co.uk wrote:
Free never really bothered me. I don't use Linux for ideological reasons. I use it because it's better than anything alse out there.
The desire for freedom isn't just ideologically motivated. Free software also lets anyone redistribute, study, adapt and distribute it. Mere "open source" licences like the Sun "Community" Source License don't give you those freedoms.
Many are tired of unresponsive vendors. With tightly-controlled things like Sun Java, you don't get enough freedom to be really creative and productive at the same time. I think desire for freedom is increasing as people realise there is a place for creativity in computing.
Hi,
On 3 Nov 2004, at 15:33, Matt Parker wrote:
And saying you'd use ANYTHING other than Java in, presumably, all cases demonstrates your inability to choose the right tool objectively.
Tsk, Matt, don't feed the troll ;-)
In the case of Java, it wins hands down on anything to do with the Internet that is more involved than a few scripted pages, and it wins hands down in terms of character encodings and language conversion.
I wouldn't say "hands down", but it certainly excels at being a modern-day language with sensible support for encodings. Anyone know how C# stacks up in this respect?
On 3 Nov 2004, at 15:37, Jonathan McDowell wrote:
I like the language, but the lack of a decent implementation holds it back. A fully working Free implementation would be a good start.
Define 'decent' ;-)
I've certainly never had major problems with blackdown on Debian or the OS X implementations of the JDK. Free would be good but is not essential for most people. Do you have other criteria in mind?
A.
On Wed, Nov 03, 2004 at 04:03:13PM +0000, Andrew Savory wrote:
Hi,
On 3 Nov 2004, at 15:33, Matt Parker wrote:
And saying you'd use ANYTHING other than Java in, presumably, all cases demonstrates your inability to choose the right tool objectively.
Tsk, Matt, don't feed the troll ;-)
It's my bridge, pay the toll, dammit!
In the case of Java, it wins hands down on anything to do with the Internet that is more involved than a few scripted pages, and it wins hands down in terms of character encodings and language conversion.
I wouldn't say "hands down", but it certainly excels at being a modern-day language with sensible support for encodings. Anyone know how C# stacks up in this respect?
There are places where it works, I'll give it that, cocoon is a fine example of something written in Java that does actually work.
C# seems to be coming along in leaps and bounds as part of the mono project, and now there's lots of Gnome developers writing things with it, I can see only good things happening in the future. My origional concerns with the language were that Microsoft were going to screw it up, as with most things they touch (IMO), but that appears, so far, to not be happening. I keep meaning to get round to playing with it further, but time constraints have meant otherwise.
Thanks,
On Wed, Nov 03, 2004 at 04:03:13PM +0000, Andrew Savory wrote:
On 3 Nov 2004, at 15:37, Jonathan McDowell wrote:
I like the language, but the lack of a decent implementation holds it back. A fully working Free implementation would be a good start.
Define 'decent' ;-)
I've certainly never had major problems with blackdown on Debian or the OS X implementations of the JDK. Free would be good but is not essential for most people. Do you have other criteria in mind?
To be fair, my main recent experiences with Java are using it with Firefox, where it seems to happily decide to fall over or use lots of memory too often for comfort.
It's about 4 years since I did any real programming in it, but at that stage the Windows, Solaris and Linux JDKs were all pretty good at consuming system resources.
Free is mainly about having it easily distributable; it's a real PITA to get it all working under Debian IME. However I also believe if it was Free then people would be able to sort out the incompatibilities that envitably show up.
As I originally said, I do think the language itself is ok. I'd much rather use it over, say, C++. I just don't tend to due to the perceived overhead. (I'm primarily Perl/C depending on circumstances.)
J.
On Wed, Nov 03, 2004 at 03:33:53PM +0000, Matt Parker wrote:
It's language syntax is bitty, and if you want a portable, interpreted language (which is basically all Java is, afterall)
As soon as you mention the word "interpreted" you are showing your ignorance. Java is NOT an interpreted language.
Right, you need a run time for it to be of any use what so ever, right? In this case, the "interpreter" is a virtual machine, executing byte code... Or has Java changed significantly in the last 3 minutes?
And saying you'd use ANYTHING other than Java in, presumably, all cases demonstrates your inability to choose the right tool objectively.
There was a time when I thought Java was worthwhile, since then I've grown the fuck up and pick languages that work, without having to munge serious amounts of crap.
I have used many languages in my time, including Java, and some I liked and some I hated. For the record Perl is something I never got on with, for example, but thats another flame-war. But the point is I used them all when the situation required the best language for the problem.
I have used Java, I have written backend server systems in Java for a russian lexical database, I have used it a fair amount. I wouldn't say that it's been the right tool for the job in any of the situations I've used it or seen it used though.
In the case of Java, it wins hands down on anything to do with the Internet that is more involved than a few scripted pages, and it wins hands down in terms of character encodings and language conversion. It has many drawbacks, including some of its API, though version 5.0 (or 1.5 whatever you want to call it) has improved the language significantly IMO.
Exactly how do you reason this? If you're thinking "Classes, OO" etc, then sorry, but perl and python both do this. python stores strings as utf-8 by default and can encode and decode between character sets. It has the most structured API for use on the Internet *right now*, and, best of all, it's runtime isn't *STUPIDLY* large.
When gcj has finally got all the API of the bloaty mess that is Java, and we can finally use Java on things that Sun haven't bothered with, then I might, just might, give it another shot. Without a Free implementation, a programming language is restrictive, and, in many cases, useless.
Now, exactly *what* are you arguing for in the case of Java? That it's always the right tool in the box? well, sorry, but you are very mistaken. In my experience Java's disadvantages out weigh what few advantages it has, especially from a system administration point of view.
On 3/11/2004, "Brett Parker" iDunno@sommitrealweird.co.uk wrote:
On Wed, Nov 03, 2004 at 03:33:53PM +0000, Matt Parker wrote:
It's language syntax is bitty, and if you want a portable, interpreted language (which is basically all Java is, afterall)
As soon as you mention the word "interpreted" you are showing your ignorance. Java is NOT an interpreted language.
Right, you need a run time for it to be of any use what so ever, right? In this case, the "interpreter" is a virtual machine, executing byte code... Or has Java changed significantly in the last 3 minutes?
Like I said, you really need to read up on the subject before spouting off inflammatory, but false, statements. This is how it works:-
1) You compile your source code into byte-code. 2) The JVM loads your byte-code at start up 3) The "Just-In-Time" Compiler compiles your byte-code to native code on the fly and optimises it on each iteration 4) On tight loops you therefore can end up with more optimised native code than something like GCC can produce
The only thing is that this process happens everytime you re-start the JVM, but then if you're running it on a server (where I believe Java should only really live - it's not useful for desktop apps really) you don't care about that because you never restart it.
And saying you'd use ANYTHING other than Java in, presumably, all cases demonstrates your inability to choose the right tool objectively.
There was a time when I thought Java was worthwhile, since then I've grown the fuck up and pick languages that work, without having to munge serious amounts of crap.
Come on, you need to be more intelligent in your argument. You're coming over like a troll.
I have used many languages in my time, including Java, and some I liked and some I hated. For the record Perl is something I never got on with, for example, but thats another flame-war. But the point is I used them all when the situation required the best language for the problem.
I have used Java, I have written backend server systems in Java for a russian lexical database, I have used it a fair amount. I wouldn't say that it's been the right tool for the job in any of the situations I've used it or seen it used though.
You'r obviously not using it correctly. If you look on a site like JobServe you'll see that Java for server-side processing is one of the (if not the most) common languages. All those people can't be wrong.
In the case of Java, it wins hands down on anything to do with the Internet that is more involved than a few scripted pages, and it wins hands down in terms of character encodings and language conversion. It has many drawbacks, including some of its API, though version 5.0 (or 1.5 whatever you want to call it) has improved the language significantly IMO.
Exactly how do you reason this? If you're thinking "Classes, OO" etc, then sorry, but perl and python both do this. python stores strings as utf-8 by default and can encode and decode between character sets. It has the most structured API for use on the Internet *right now*, and, best of all, it's runtime isn't *STUPIDLY* large.
When gcj has finally got all the API of the bloaty mess that is Java, and we can finally use Java on things that Sun haven't bothered with, then I might, just might, give it another shot. Without a Free implementation, a programming language is restrictive, and, in many cases, useless.
Now, exactly *what* are you arguing for in the case of Java? That it's always the right tool in the box? well, sorry, but you are very mistaken. In my experience Java's disadvantages out weigh what few advantages it has, especially from a system administration point of view.
Java's API has all kinds of standard things to make cross language/country code mush easier IMO. Such as pre-defined currency, date, time, character set, time-zone, etc etc conversions that seamlessly interact.
As for what Sun's supported platforms - there are many JVM implementations out there, a couple of note being IBM's and BEA's.
I don't see why you're getting so angry about this anyway. I only suggested that if MJR is having a problem with Unicode character encoding that he give Java a look. You'd think the way you're going on that I said something horrible about his parentage.
Matt
On Wed, Nov 03, 2004 at 04:09:12PM +0000, Matt Parker wrote:
On 3/11/2004, "Brett Parker" iDunno@sommitrealweird.co.uk wrote:
Like I said, you really need to read up on the subject before spouting off inflammatory, but false, statements. This is how it works:-
- You compile your source code into byte-code.
- The JVM loads your byte-code at start up
- The "Just-In-Time" Compiler compiles your byte-code to native code
on the fly and optimises it on each iteration 4) On tight loops you therefore can end up with more optimised native code than something like GCC can produce
The only thing is that this process happens everytime you re-start the JVM, but then if you're running it on a server (where I believe Java should only really live - it's not useful for desktop apps really) you don't care about that because you never restart it.
Sorry, never restart a Java servlet engine? Never restart a Java app? I find this a fascinating concept, and one that I've never actually seen work.
You'r obviously not using it correctly. If you look on a site like JobServe you'll see that Java for server-side processing is one of the (if not the most) common languages. All those people can't be wrong.
You've seen the number of sites that use PHP, right? Are you saying that all those people are also not wrong? If you're just going on numbers, there's a hell of a lot of Java developers, and PHP developers about, what's used depends on who's available, not neccessarily on wether it is right for the job. Please don't use the argument that "it's got lots of users, therefore it's good", it isn't an argument that sticks.
Thanks,
On Wed, Nov 03, 2004 at 04:09:12PM +0000, Matt Parker wrote:
The only thing is that this process happens everytime you re-start the JVM, but then if you're running it on a server (where I believe Java should only really live - it's not useful for desktop apps really) you don't care about that because you never restart it.
In many cases java code runs in the JVM of a browser, this is supposed to be one of the places where it excels isn't it? Not server java really.
that it's been the right tool for the job in any of the situations I've used it or seen it used though.
You'r obviously not using it correctly. If you look on a site like JobServe you'll see that Java for server-side processing is one of the (if not the most) common languages. All those people can't be wrong.
Like Sun readers no doubt! :-)
"All those people can't be wrong" is *not* a good argument, it's an argument for universal use of MS software for a start.
Java's API has all kinds of standard things to make cross language/country code mush easier IMO. Such as pre-defined currency, date, time, character set, time-zone, etc etc conversions that seamlessly interact.
Yes, this I will agree with in general having had to do this sort of stuff too many times. It's not perfectly implemented in java but it's quite good.
On Wed, Nov 03, 2004 at 03:50:06PM +0000, Chris Green wrote:
On Wed, Nov 03, 2004 at 04:09:12PM +0000, Matt Parker wrote:
The only thing is that this process happens everytime you re-start the JVM, but then if you're running it on a server (where I believe Java should only really live - it's not useful for desktop apps really) you don't care about that because you never restart it.
In many cases java code runs in the JVM of a browser, this is supposed to be one of the places where it excels isn't it? Not server java really.
Applets bug me - if only because for certain things they only work in $particular_version of $particular_distributor of a JRE. Server side stuff does at least work, most of the time.
Thanks,
On 3/11/2004, "Chris Green" chris@areti.co.uk wrote:
On Wed, Nov 03, 2004 at 04:09:12PM +0000, Matt Parker wrote:
The only thing is that this process happens everytime you re-start the JVM, but then if you're running it on a server (where I believe Java should only really live - it's not useful for desktop apps really) you don't care about that because you never restart it.
In many cases java code runs in the JVM of a browser, this is supposed to be one of the places where it excels isn't it? Not server java really.
Back in 1998 maybe... The server-side is Java's speciality and has been for a long time, certainly at enterprise level with things like WebSphere and WebLogic. Oracle has a massive Java API now too.
Matt
On 2004-11-03 16:09:12 +0000 Matt Parker matt@mpcontracting.co.uk wrote:
On 3/11/2004, "Brett Parker" iDunno@sommitrealweird.co.uk wrote:
On Wed, Nov 03, 2004 at 03:33:53PM +0000, Matt Parker wrote:
As soon as you mention the word "interpreted" you are showing your ignorance. Java is NOT an interpreted language.
Debating whether it is an interpreted/ compiled/ VM/ RE/ JIT/ PC language is so 1980s.
[...] Java for server-side processing is one of the (if not the most) common languages. All those people can't be wrong.
Yes, they can all be wrong, for whatever reason. I think Microsoft and VHS both show that. I like them being wrong if their failure won't harm me.
[...]
Java's API has all kinds of standard things to make cross language/country code mush easier IMO. Such as pre-defined currency, date, time, character set, time-zone, etc etc conversions that seamlessly interact.
I'm not convinced conversions should be in the core language, though, as you either promote some character sets over others, or require lots of work from all standards-compliant implementations. I think Java dealt with this by producing multiple variants (Mobile Edition), but that pretty much killed Java's oft-hyped cross-platform. Maybe this is another reason for the slow development of competition among Java implementations.
To me, it's even more impressive that the Scheme language standards accommodate unicode. The characters section looks unchanged since the Revised^4 Report on Scheme was published in November 1991. I guess it shouldn't be surprising, because multiple character sets have been around for a long long time and the report authors are wise men, but I can think of some languages which don't like character set work much.
Far-sighted and fun to hack with. I guess Scheme is my preferred "right tool for the job" of most programming.
I don't see why you're getting so angry about this anyway. I only suggested that if MJR is having a problem with Unicode character encoding that he give Java a look. [...]
Posting Java advocacy was fairly off-topic, in my opinion. Java is not a practical solution for connecting a C database library to a Scheme implementation. Your post told me nothing about the usual C data types for UTF-16. You also failed to trim the quoted post.
I had a horrible time trying to work with a system running on top of the Blackdown Java a few years ago. I have the scars. Now, I don't think you knew that, so I don't hold it against you. I'm pretty sure Brett knew, so maybe he was being over-protective of me? I think he has his own Java bad experiences too. He's a nice guy, but he does stomp in with heavy boots sometimes.
Posting Java advocacy was fairly off-topic, in my opinion. Java is not a practical solution for connecting a C database library to a Scheme implementation. Your post told me nothing about the usual C data types for UTF-16. You also failed to trim the quoted post.
I realised that I'd read your original post wrongly the first time. I thought you were looking for an alternative to Scheme to solve your problem on first read so my post was not intentionally offtopic.
I had a horrible time trying to work with a system running on top of the Blackdown Java a few years ago. I have the scars. Now, I don't think you knew that, so I don't hold it against you. I'm pretty sure Brett knew, so maybe he was being over-protective of me? I think he has his own Java bad experiences too.
Fair enough, I've had bad experiences with Perl. Horses for courses...
He's a nice guy, but he does stomp in with heavy boots sometimes.
He's immature, but I'm sure he'll grow out of it. His posts remind me of the infamous Jamie Baillie (Google for him if you're unaware). Plus I can never resist a good troll baiting ;-)
Matt
On 2004-11-03 15:25:44 +0000 Brett Parker iDunno@sommitrealweird.co.uk wrote:
Two words: SUCKS [...]
Brett, you've discussed your anatomy twice on the mailing list so far today. Maybe postpone and redraft the next one? There might be sensitive city gents reading.
I've seen encouraging emails about free Java systems at http://cscience.org/pipermail/gobo-l/2004-October/005268.html and http://cscience.org/pipermail/gobo-l/2004-October/005272.html so maybe soon you'll be able to use Java again. I'm sure you'll be happy about that. ;-)
Back to the issue: Mark, I honestly don't know, but I wouldn't be at all suprised if it was treated as different things by different people, possibly even defining a new type for it. What do gnu.org suggest?
I don't know. I looked at the GNU coding standards http://www.gnu.org/prep/standards_toc.html and the GNUstep coding standards http://www.gnustep.org/resources/documentation/Developer/CodingStandards/cod... without figuring it out. Am I blind or not?
MJ Ray mjr@dsl.pipex.com writes:
This is a typical slippery slope... I start off from a general issue that affects most users, through a specific application, to a specific programming question. Are you holding tight? Is there a C doctor in the house?
Some of you may have "enjoyed" the change of character set from ISO-8859-1 (an 8-bit character code, so 256 possible characters)) to utf-8 (a large character set which is converted into 8-bit codes, pairs of 8-bit codes and so on). Basically, 8859-1 only lets you display western European text, while utf-8 lets you have southern or eastern European languages, or greek or cyrillic or whatever, all at once without doing anything unusual with character sets. FAQ at http://www.cl.cam.ac.uk/~mgk25/unicode.html
The SQLite database seems to use UTF-16 as a basic datatype.
Not really, the ...16 functions are more complex than the UTF-8 versions and in some cases are wrappers around them.
I was having a browse after it was suggested that I try writing a Scheme interface to it. When reading http://www.sqlite.org/capi3.html, the following caught my eye: "There is no agreement on what the C datatype for a UTF-16 string should be."
Is there really such disagreement on this basic datatype? The FAQ makes it look clearcut on wchar_t. What types do C programmers really use?
In practice I've found using UTF-8 internally, and converting to the current locale's encoding (or whatever other interfaces require) at the boundaries, to be the most convenient approach. Most of your intuitions about string handling survive, you don't have to worry about shift states, extracting the actually character code (if you need it) is easy and efficient, etc.
wchar_t is platform dependent and locale-dependent, though I've no idea if anyone is mad enough to make it actually differ from locale to locale. Linux uses UTF-32; AIUI Windows uses UTF-16.
The encoding of multibyte strings is similarly platform and locale dependent and does vary between locales in reali life; this makes it extremely inconvenient to actually do anything interesting with them in a correct fashion. You have to remember shift states, you can't safely use strchr() or anything else that uses the same assumptions, etc.
I've not personally tried to use UTF-16, but the combination of being both variable length and non-byte-oriented sounds very inconvenient.
As for sqlite, sqlite 3 has UTF-8 and UTF-16 versions of functions. I can't imagine why you'd want the UTF-16 versions unless you were committed to Windows.
On Wed, 03 Nov 2004 15:59:29 +0000, Richard Kettlewell rjk@terraraq.org.uk wrote:
wchar_t is platform dependent and locale-dependent, though I've no idea if anyone is mad enough to make it actually differ from locale to locale. Linux uses UTF-32; AIUI Windows uses UTF-16.
I've not personally tried to use UTF-16, but the combination of being both variable length and non-byte-oriented sounds very inconvenient.
Is UTF-16 is variable length? I thought it was fixed at 16 bits per character. You might be thinking of MultiByte which is definately does not use a fixed number of bits per character.
Tim.
On Wed, 2004-11-03 at 17:00 +0000, Tim Green wrote:
Is UTF-16 is variable length?
Yes. See http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf Section 3.9
-- Martijn
Tim Green timothy.j.green@gmail.com writes:
Richard Kettlewell rjk@terraraq.org.uk wrote:
wchar_t is platform dependent and locale-dependent, though I've no idea if anyone is mad enough to make it actually differ from locale to locale. Linux uses UTF-32; AIUI Windows uses UTF-16.
I've not personally tried to use UTF-16, but the combination of being both variable length and non-byte-oriented sounds very inconvenient.
Is UTF-16 is variable length? I thought it was fixed at 16 bits per character. You might be thinking of MultiByte which is definately does not use a fixed number of bits per character.
Yes, UTF-16 is variable length.