[Openmcl-devel] Is this a bug?

Gary Byers gb at clozure.com
Sun Dec 16 11:56:02 PST 2007


I had thought that making ISO-8859-1 (which just maps 8-bit codes to 
the first 256 Unicode code points) the default would be the least 
traumatic/least likely to break existing code, since "just treating 8-
bit character code literally" was about what the lisp did before it 
started using Unicode internally.

Would UTF-8 make a better default ?

(I think that it's hard for the lisp to guess reliably; locale 
information isn't always accurate, and it's hard to know what Emacs 
thinks about the buffer(s) it's running the lisp in.


On 12/15/2007 01:26:56 PM, R. Matthew Emerson wrote:
> 
> On Dec 15, 2007, at 12:10 PM, Ron Garret wrote:
> 
> > Is this a bug?
> >
> > Welcome to Clozure Common Lisp Version 1.1-r7902 (DarwinX8664)!
> > ? (elt "ß" 0)
> > #\Latin_Capital_Letter_A_With_Tilde
> > ? (length "ß")
> > 2
> >
> > Version 1.1 is advertised as unicode-native, which I would have
> > thought would make the above return ß and 1 instead of #
> > \Latin_Capital_Letter_A_With_Tilde and 2.  Or am I missing
> something?
> >
> > The reason I'm really asking, BTW, is not that I really care about
> > being correct about the length of unicode strings, but that I want
> to
> > create reader macros for unicode characters (in particular « and »)
> so
> > the error I really care about is this one:
> >
> > ? #\«
> >> Error: Unknown character name - "«" .
> 
> 
> Specify the -K utf-8 option when you start up CCL.
> 
> $ openmcl64 -K utf-8
> Welcome to Clozure Common Lisp Version 1.1-r7685:7830MS
> (DarwinX8664)!
> ? (elt "ß" 0)
> #\Latin_Small_Letter_Sharp_S
> ? (length "ß")
> 1
> ? #\«
> #\Left-Pointing_Double_Angle_Quotation_Mark
> ?
> 
> (If you now save the lisp with save-application, this encoding 
> setting
>  
> will stick.)
> 
> The default external-format for *terminal-io* and other streams whose 
> 
> encoding is not explicitly specified is ISO-8859-1.  See ccl:release-
> 
> notes.txt (search for ISO-8859-1) for some notes about this.
> 
> As for the odd output you were seeing:
> 
> UTF-8 for the ß is #xc3, #x9f (note the two octets, hence the length 
> 
> 2), and the ISO-8859-1 character for code point #xc3 is # 
> \Latin_Capital_Letter_A_With_Tilde, which explains the name.  Your  
> terminal is using the UTF-8 encoding, but the lisp is treating the  
> bytes as ISO-8859-1.
> 
> Hope this helps.
> 
> 
> 
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
> 
> 







More information about the Openmcl-devel mailing list