[ALUG] unicode, sqlite and C

3 Nov 2004

      This is a typical slippery slope... I start off from a general issue 
that affects most users, through a specific application, to a specific 
programming question. Are you holding tight? Is there a C doctor in 
the house?

Some of you may have "enjoyed" the change of character set from 
ISO-8859-1 (an 8-bit character code, so 256 possible characters)) to 
utf-8 (a large character set which is converted into 8-bit codes, 
pairs of 8-bit codes and so on). Basically, 8859-1 only lets you 
display western European text, while utf-8 lets you have southern or 
eastern European languages, or greek or cyrillic or whatever, all at 
once without doing anything unusual with character sets. FAQ at 
http://www.cl.cam.ac.uk/~mgk25/unicode.html

The SQLite database seems to use UTF-16 as a basic datatype. I was 
having a browse after it was suggested that I try writing a Scheme 
interface to it. When reading http://www.sqlite.org/capi3.html, the 
following caught my eye: "There is no agreement on what the C datatype 
for a UTF-16 string should be."

Is there really such disagreement on this basic datatype? The FAQ 
makes it look clearcut on wchar_t. What types do C programmers really 
use?

For my part, Scheme's character and string datatypes seem to cope with 
unicode in theory, but the implementation details (such as what 
character set) are still being thrashed out.

-- 
MJR/slef    My Opinion Only and not of any group I know
  Creative copyleft computing - http://www.ttllp.co.uk/
  Unsolicited attachments to the pipex address deleted
Will HLF fund tree-killings? http://www.thewalks.co.uk/

[ALUG] unicode, sqlite and C

MJ Ray