Re: [ALUG] unicode, sqlite and C

3 Nov 2004


      On 3/11/2004, "MJ Ray" mjr@dsl.pipex.com wrote:
...
This is a typical slippery slope... I start off from a general issue
that affects most users, through a specific application, to a specific
programming question. Are you holding tight? Is there a C doctor in
the house?
Some of you may have "enjoyed" the change of character set from
ISO-8859-1 (an 8-bit character code, so 256 possible characters)) to
utf-8 (a large character set which is converted into 8-bit codes,
pairs of 8-bit codes and so on). Basically, 8859-1 only lets you
display western European text, while utf-8 lets you have southern or
eastern European languages, or greek or cyrillic or whatever, all at
once without doing anything unusual with character sets. FAQ at
http://www.cl.cam.ac.uk/~mgk25/unicode.html
The SQLite database seems to use UTF-16 as a basic datatype. I was
having a browse after it was suggested that I try writing a Scheme
interface to it. When reading http://www.sqlite.org/capi3.html, the
following caught my eye: "There is no agreement on what the C datatype
for a UTF-16 string should be."
Is there really such disagreement on this basic datatype? The FAQ
makes it look clearcut on wchar_t. What types do C programmers really
use?
For my part, Scheme's character and string datatypes seem to cope with
unicode in theory, but the implementation details (such as what
character set) are still being thrashed out.
One word - Java...
Unicode capability and built in character conversions for everyhing from
ISO-8856-1, through UTF-8 and UTF-16 to things like BIG-5 for Chinese.
It's one of the things that was designed in from the beginning. There's
a reason that "use the right tool for the job" became such a popular
saying...
Matt

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: [ALUG] unicode, sqlite and C