[Openmcl-devel] code-char from #xD800 to #xDFFF

peter p2.edoc at googlemail.com
Tue Jul 31 09:48:33 PDT 2012


It seems that code-char returns nil from #xD800 to #xDFFF, otherwise 
it returns characters from 0 to (- (lsh 1 16) 3). I take it as 
defined in ccl::fixnum->char.

<http://www.unicode.org/charts/PDF/UDC00.pdf> and 
<http://www.unicode.org/charts/PDF/UD800.pdf> say
"Isolated surrogate code points have no interpretation; consequently, 
no character code charts or names lists are provided for this range."

<http://ccl.clozure.com/manual/chapter4.5.html#Unicode> says these 
codes: "will never be valid character codes and will return NIL for 
arguments in that range".

When using CCL to run a dynamic web service, this can be inconvenient 
when passing material from external sources through CCL to a remote 
browsers (for instance, Japanese Emoji icon characters occupy this 
code area, sources use them and web browsers render them).

I cannot understand why CCL should behave as it does in this, but 
assume there is good reason. Ie. would it not make sense to return a 
character with appropriate code value even if CCL has no  use for 
that.

Is there any efficient strategy which side-steps this issue?
At the moment I am intercepting character codes in this area and 
replacing them with #\Replacement_Character or #\null, but in so 
doing losing the character code.  Hence I would be passing material 
through CCL such that some characters were eliminated in transit, 
hence changing the original meaning/intent of the material.



More information about the Openmcl-devel mailing list