[Openmcl-devel] Plans for Unicode support within OpenMCL?

Takehiko Abe keke at gol.com
Wed Mar 22 07:51:29 PST 2006

Tom Emerson wrote:

> > But you cannot enforce it. It is possible that you see them.
> You cannot enforce it, but the presence of a surrogate value in a  
> UTF-32 stream is an error: there are no defined semantics for that  
> character. So while you can see them, they have no valid character  
> value.

You still have to decide what to do with an error.
I think doing nothing can be a reasonalbe option in this case.

> > If they are rare, I think it is reasonable to assume that it is up
> > to each user how to deal with them. I think the same is true for
> > surrogate pairs in UTF-16. They are rare too.
> You cannot put combining characters and surrogate pairs in the same  
> bin. A combining character *is* a valid character: the fact that it  
> combines with the following character(s) is a display issue.

It is not only a display issue! Unicode has precompsed characters too,
which leads to issues of canonical equivalence and normalization

> Surrogates are an encoding artifact. Expecting the programmer to deal  
> with them is wrong.

But somebody got to deal with them at some level.

We are talking about OpenMCL's string implementation. I believe
that a Lisp implementation does not have to try too hard to do
unicode right at the language level. Someone could add a library for
unicode string handling later.

More information about the Openmcl-devel mailing list