[Openmcl-devel] Is this a bug?

R. Matthew Emerson rme at clozure.com
Sat Dec 15 12:26:56 PST 2007


On Dec 15, 2007, at 12:10 PM, Ron Garret wrote:

> Is this a bug?
>
> Welcome to Clozure Common Lisp Version 1.1-r7902 (DarwinX8664)!
> ? (elt "ß" 0)
> #\Latin_Capital_Letter_A_With_Tilde
> ? (length "ß")
> 2
>
> Version 1.1 is advertised as unicode-native, which I would have
> thought would make the above return ß and 1 instead of #
> \Latin_Capital_Letter_A_With_Tilde and 2.  Or am I missing something?
>
> The reason I'm really asking, BTW, is not that I really care about
> being correct about the length of unicode strings, but that I want to
> create reader macros for unicode characters (in particular « and ») so
> the error I really care about is this one:
>
> ? #\«
>> Error: Unknown character name - "«" .


Specify the -K utf-8 option when you start up CCL.

$ openmcl64 -K utf-8
Welcome to Clozure Common Lisp Version 1.1-r7685:7830MS (DarwinX8664)!
? (elt "ß" 0)
#\Latin_Small_Letter_Sharp_S
? (length "ß")
1
? #\«
#\Left-Pointing_Double_Angle_Quotation_Mark
?

(If you now save the lisp with save-application, this encoding setting  
will stick.)

The default external-format for *terminal-io* and other streams whose  
encoding is not explicitly specified is ISO-8859-1.  See ccl:release- 
notes.txt (search for ISO-8859-1) for some notes about this.

As for the odd output you were seeing:

UTF-8 for the ß is #xc3, #x9f (note the two octets, hence the length  
2), and the ISO-8859-1 character for code point #xc3 is # 
\Latin_Capital_Letter_A_With_Tilde, which explains the name.  Your  
terminal is using the UTF-8 encoding, but the lisp is treating the  
bytes as ISO-8859-1.

Hope this helps.






More information about the Openmcl-devel mailing list