[Openmcl-devel] Plans for Unicode support within OpenMCL?
Takehiko Abe
keke at gol.com
Wed Mar 22 07:51:29 PST 2006
Tom Emerson wrote:
> > But you cannot enforce it. It is possible that you see them.
>
> You cannot enforce it, but the presence of a surrogate value in a
> UTF-32 stream is an error: there are no defined semantics for that
> character. So while you can see them, they have no valid character
> value.
You still have to decide what to do with an error.
I think doing nothing can be a reasonalbe option in this case.
>
> > If they are rare, I think it is reasonable to assume that it is up
> > to each user how to deal with them. I think the same is true for
> > surrogate pairs in UTF-16. They are rare too.
>
> You cannot put combining characters and surrogate pairs in the same
> bin. A combining character *is* a valid character: the fact that it
> combines with the following character(s) is a display issue.
It is not only a display issue! Unicode has precompsed characters too,
which leads to issues of canonical equivalence and normalization
forms.
> Surrogates are an encoding artifact. Expecting the programmer to deal
> with them is wrong.
But somebody got to deal with them at some level.
We are talking about OpenMCL's string implementation. I believe
that a Lisp implementation does not have to try too hard to do
unicode right at the language level. Someone could add a library for
unicode string handling later.
More information about the Openmcl-devel
mailing list