[Openmcl-devel] [babel-devel] Changes

Luis Oliveira luismbo at gmail.com
Sun Apr 12 17:49:22 PDT 2009


[I wrote this reply a while ago but only now noticed I had sent it
 to Gary only. Resending it now.]

On Fri, Apr 10, 2009 at 4:34 PM, Gary Byers <gb at clozure.com> wrote:
> CCL's CODE-CHAR does return NIL for most codes that can't denote
> characters, but it does let some invalid cases slip through. =A0I
> think
> that it'd be more consistent if it caught more invalid cases, but
> I don't find the argument that says "since other implementations
> don't seem to check validity at all, CCL shouldn't either" too
> compelling.

Not very compelling, true, but that's not quite the argument I was
trying to make since I hadn't realized there were validity issues.
(Such as trying to encode these code points in UTF-16, as you
described.)

But still, FWIW, I checked five other implementations (Lispworks,
Allegro CL, Python 3, GHC and Factor) and they all allow these
characters AFAICT.


> If you beleive that (CODE-CHAR #xd800) should return a CHARACTER,
> then presumably it's meaningful to create a string full of those
> characters.
>
> (make-string 17 :initial-element (code-char #xd800)) ; will error in
> CCL
>
> You should be able to write that string to a file in some flavor of
> UTF-16 and read it back in with no loss of information, right ?
>
> I find it more reasonable to avoid this kind of inconsistency
> completely and say that (CODE-CHAR #xd800) isn't a character.

That's a good point. How about signalling an error a bit later, when
doing the encoding? Would that work?

I agree with you that not doing any error checking whatsoever thus
generating an invalid UTF-16 sequence is not a great idea -- as, say,
Python does -- so I'm not suggesting CCL should mimic that behaviour.
CLISP, however, does what I suggested above.

    > (ext:convert-string-to-bytes (string #\udc80) charset:utf-16)

    *** - CONVERT-STRING-TO-BYTES: Character #\uDC80 cannot be
          represented in the character set "UTF-16"

It seems to me that this is the best approach.


> I'm skeptical of the claim that all means of implementing UFF-8B
> depend
> on (CODE-CHAR #xdcf0) returning non-nil.

How else could it be implemented?

[This has already been answered now, of course.]

-- 
Luís Oliveira
http://student.dei.uc.pt/~lmoliv/




More information about the Openmcl-devel mailing list