[Openmcl-devel] Plans for Unicode support within OpenMCL?
Takehiko Abe
keke at gol.com
Wed Mar 22 07:03:32 PST 2006
Tom Emerson wrote:
> It is important not to get mixed up between code point values and the
> various encoding forms that are available. When viewed this way,
> surrogate pairs become an encoding issue. The view of characters that
> the programmer seems is of whole codepoints: they should *never* see
> a character value in the range #xD800 -- #xDFFF
But you cannot enforce it. It is possible that you see them.
> because these cannot
> appear in a valid Unicode stream.
>
> Unicode's character model is well
> documented: there should be no confusion there.
>
> With regards to combining characters: these are a necessary part of
> the standard, and you have to deal with them. UAX #29 provides
> algorithms for doing appropriate text boundary detection, including
> 'glyph' boundaries. However, most people don't need to deal with
> that. I've worked in Unicode for almost 8 years and combining
> characters rarely show up in practice in my experience.
If they are rare, I think it is reasonable to assume that it is up
to each user how to deal with them. I think the same is true for
surrogate pairs in UTF-16. They are rare too.
>
> For internal representation: multiple internal representations make a
> lot of sense. You can use ASCII for characters under #x80. Characters
> above #x80 can use UTF-16.
If we will have multiple string types, let's use Latin-1 for under
#xFF.
More information about the Openmcl-devel
mailing list