[Openmcl-devel] Plans for Unicode support within OpenMCL?

Tue Mar 21 07:46:22 PST 2006

On Tue, 21 Mar 2006, Takehiko Abe wrote:

> Gary Byers wrote:
>
>> I'd be more willing to agree as long as we (whoever we are ...) are
>> talking about a large and useful subset of UTF-16.
>>
>> The argument (my argument, at least) against representing strings
>> internally in an encoded format has to do with cases where you're
>> not doing I/O with them.
>
> A quick response (I may have missed your point):
>
> What I wanted to say is that we can pretend that UTF-16's code values
> are real code points and treat surrogate values as legit char codes.
> Then, UTF-16 would not be an encoded format any more.

Yes; I agree that this is reasonable.  I think that UTF-16 without
surrogate pairs is referred to as the "Basic Multilingual Plane",
and covers a very high percentage of the characters/languages that
people would be likely to want to use.

>
> Make handling surrogate pairs properly a user's task.
>

I think that it was true that in some earlier versions of the Unicode
standard, all defined characters could be encoded in 16 bits, and that
a lot of people/programs/libraries still work in this subset and never
deal with surrogate pairs/variable-length encodings.  (It's not clear
to me whether MacOS or Windows support support code points that can't
be encoded in 16 bits, or if there are plans to change to change that;
people seem to use the term UTF-16 informally in at least some cases.)