[Openmcl-devel] [babel-devel] Changes

Fri Apr 10 08:34:33 PDT 2009

On Fri, 10 Apr 2009, Luís Oliveira wrote:

> [Sending a copy to the openmcl-devel mailing list.]
>
> On Wed, Apr 8, 2009 at 9:51 PM, Dan Weinreb <dlw at itasoftware.com> wrote:
>> CCL does not support having a character with code #\udcf0.
>> The reader signals a condition if it sees this.  Unfortunately,
>> using #-ccl does not seem to solve the problem, presumably
>> since the #- macro is working by calling "read" and it is
>> not suppressing unhandled conditions, or something like
>> that.  It might be hard to fix that in a robust way.
>
> Interesting. It seems that #-ccl works fine for CCL's #\ but not for
> Babel's #\ which is defined in babel/src/sharp-backslash.lisp and it's
> what we're using within the test suite. That is of course my fault. I
> now see in CLHS that *READ-SUPRESS* should be honoured by each reader
> and I had missed that.
>
> What's the rationale behind not supporting the High Surrogate Area
> (D800–DBFF)? I can see how that might make sense in that Unicode
> states that this area does not have any character assignments. But,
> FWIW, the other three Lisps with full unicode support that I'm
> familiar with -- SBCL, CLISP and ECL -- handle this area just fine.

"Handling it just fine" presumably means that CODE-CHAR in those
implementations returns a non-NIL character for codes that can't
validly denote characters.

CCL's CODE-CHAR does return NIL for most codes that can't denote
characters, but it does let some invalid cases slip through.  I think
that it'd be more consistent if it caught more invalid cases, but
I don't find the argument that says "since other implementations
don't seem to check validity at all, CCL shouldn't either" too
compelling.

If you beleive that (CODE-CHAR #xd800) should return a CHARACTER,
then presumably it's meaningful to create a string full of those
characters.

(make-string 17 :initial-element (code-char #xd800)) ; will error in CCL

You should be able to write that string to a file in some flavor of
UTF-16 and read it back in with no loss of information, right ?

I find it more reasonable to avoid this kind of inconsistency
completely and say that (CODE-CHAR #xd800) isn't a character.

>
> The disadvantage of not handling this area is that we can't implement
> the UTF-8B encoding.

I'm skeptical of the claim that all means of implementing UFF-8B depend
on (CODE-CHAR #xdcf0) returning non-nil.

> What's the advantage?

What, in general, are the advantages of detecting errors ?