[Openmcl-devel] Is this a bug?
R. Matthew Emerson
rme at clozure.com
Sat Dec 15 12:26:56 PST 2007
On Dec 15, 2007, at 12:10 PM, Ron Garret wrote:
> Is this a bug?
>
> Welcome to Clozure Common Lisp Version 1.1-r7902 (DarwinX8664)!
> ? (elt "ß" 0)
> #\Latin_Capital_Letter_A_With_Tilde
> ? (length "ß")
> 2
>
> Version 1.1 is advertised as unicode-native, which I would have
> thought would make the above return ß and 1 instead of #
> \Latin_Capital_Letter_A_With_Tilde and 2. Or am I missing something?
>
> The reason I'm really asking, BTW, is not that I really care about
> being correct about the length of unicode strings, but that I want to
> create reader macros for unicode characters (in particular « and ») so
> the error I really care about is this one:
>
> ? #\«
>> Error: Unknown character name - "«" .
Specify the -K utf-8 option when you start up CCL.
$ openmcl64 -K utf-8
Welcome to Clozure Common Lisp Version 1.1-r7685:7830MS (DarwinX8664)!
? (elt "ß" 0)
#\Latin_Small_Letter_Sharp_S
? (length "ß")
1
? #\«
#\Left-Pointing_Double_Angle_Quotation_Mark
?
(If you now save the lisp with save-application, this encoding setting
will stick.)
The default external-format for *terminal-io* and other streams whose
encoding is not explicitly specified is ISO-8859-1. See ccl:release-
notes.txt (search for ISO-8859-1) for some notes about this.
As for the odd output you were seeing:
UTF-8 for the ß is #xc3, #x9f (note the two octets, hence the length
2), and the ISO-8859-1 character for code point #xc3 is #
\Latin_Capital_Letter_A_With_Tilde, which explains the name. Your
terminal is using the UTF-8 encoding, but the lisp is treating the
bytes as ISO-8859-1.
Hope this helps.
More information about the Openmcl-devel
mailing list