[Openmcl-devel] Is this a bug?
ron at awun.net
Sun Dec 16 23:06:34 UTC 2007
I wouldn't fret too much about the default (particularly since it's so
easy to change). If I were going to do anything about this "problem"
it would be to make the documentation about character encodings a bit
more prominent (it's pretty buried in the release notes right now), or
maybe add something to the startup message along the lines of:
Welcome to Clozure Common Lisp Version 1.1-r7902 (DarwinX8664)!
Default character encoding is set to ISO-8559-1
Set CCL:*DEFAULT-EXTERNAL-FORMAT* (or use the -K flag) to change it
But I mostly wouldn't worry about it at this point. There are bigger
fish to fred, er, fry. ;-)
On Dec 16, 2007, at 11:56 AM, Gary Byers wrote:
> I had thought that making ISO-8859-1 (which just maps 8-bit codes to
> the first 256 Unicode code points) the default would be the least
> traumatic/least likely to break existing code, since "just treating 8-
> bit character code literally" was about what the lisp did before it
> started using Unicode internally.
> Would UTF-8 make a better default ?
> (I think that it's hard for the lisp to guess reliably; locale
> information isn't always accurate, and it's hard to know what Emacs
> thinks about the buffer(s) it's running the lisp in.
> On 12/15/2007 01:26:56 PM, R. Matthew Emerson wrote:
>> On Dec 15, 2007, at 12:10 PM, Ron Garret wrote:
>>> Is this a bug?
>>> Welcome to Clozure Common Lisp Version 1.1-r7902 (DarwinX8664)!
>>> ? (elt "ß" 0)
>>> ? (length "ß")
>>> Version 1.1 is advertised as unicode-native, which I would have
>>> thought would make the above return ß and 1 instead of #
>>> \Latin_Capital_Letter_A_With_Tilde and 2. Or am I missing
>>> The reason I'm really asking, BTW, is not that I really care about
>>> being correct about the length of unicode strings, but that I want
>>> create reader macros for unicode characters (in particular « and »)
>>> the error I really care about is this one:
>>> ? #\«
>>>> Error: Unknown character name - "«" .
>> Specify the -K utf-8 option when you start up CCL.
>> $ openmcl64 -K utf-8
>> Welcome to Clozure Common Lisp Version 1.1-r7685:7830MS
>> ? (elt "ß" 0)
>> ? (length "ß")
>> ? #\«
>> (If you now save the lisp with save-application, this encoding
>> will stick.)
>> The default external-format for *terminal-io* and other streams whose
>> encoding is not explicitly specified is ISO-8859-1. See ccl:release-
>> notes.txt (search for ISO-8859-1) for some notes about this.
>> As for the odd output you were seeing:
>> UTF-8 for the ß is #xc3, #x9f (note the two octets, hence the length
>> 2), and the ISO-8859-1 character for code point #xc3 is #
>> \Latin_Capital_Letter_A_With_Tilde, which explains the name. Your
>> terminal is using the UTF-8 encoding, but the lisp is treating the
>> bytes as ISO-8859-1.
>> Hope this helps.
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
More information about the Openmcl-devel