[Openmcl-devel] Plans for Unicode support within OpenMCL?

Takehiko Abe keke at gol.com
Wed Mar 22 02:38:31 UTC 2006


James Anderson wrote:

> yes, i had understood apple's CFString to be utf-16 encoded unicode.

CFString is opaque type so you cannot say it is utf-16 encoded
(I think it can have multiple internal representations.)
Anyways, I don't use it as long as I don't have to.

> which means that character position and byte position in such a string
> do not agree. and things like subseq and char in lisp cannot treat
> such unicode strings as sequences with elements of uniform size.
>
> all known problems.

I think that the problem is what we mean by 'element'. Even if
you use UTF-32 or direct codepoints, you have base char +
combining chars sequence and other cuties such as hangul
conjoining jamo. And you may still not be able to treat unicode
stings as sequences of fixed size elements.
 
> i taken your response to mean that you have been writing apps under
> those conditions found a way to cope.

As to surrogate values, we know the range. And the Unicode
consortium publishes the unicode character database which
defines properties of 'characters'. You can look it up
and know what you need to do with a given 'character'.





More information about the Openmcl-devel mailing list