[Openmcl-devel] Is this a bug?
Ron Garret
ron at awun.net
Sun Dec 16 15:06:34 PST 2007
I wouldn't fret too much about the default (particularly since it's so
easy to change). If I were going to do anything about this "problem"
it would be to make the documentation about character encodings a bit
more prominent (it's pretty buried in the release notes right now), or
maybe add something to the startup message along the lines of:
Welcome to Clozure Common Lisp Version 1.1-r7902 (DarwinX8664)!
Default character encoding is set to ISO-8559-1
Set CCL:*DEFAULT-EXTERNAL-FORMAT* (or use the -K flag) to change it
?
But I mostly wouldn't worry about it at this point. There are bigger
fish to fred, er, fry. ;-)
rg
On Dec 16, 2007, at 11:56 AM, Gary Byers wrote:
> I had thought that making ISO-8859-1 (which just maps 8-bit codes to
> the first 256 Unicode code points) the default would be the least
> traumatic/least likely to break existing code, since "just treating 8-
> bit character code literally" was about what the lisp did before it
> started using Unicode internally.
>
> Would UTF-8 make a better default ?
>
> (I think that it's hard for the lisp to guess reliably; locale
> information isn't always accurate, and it's hard to know what Emacs
> thinks about the buffer(s) it's running the lisp in.
>
>
> On 12/15/2007 01:26:56 PM, R. Matthew Emerson wrote:
>>
>> On Dec 15, 2007, at 12:10 PM, Ron Garret wrote:
>>
>>> Is this a bug?
>>>
>>> Welcome to Clozure Common Lisp Version 1.1-r7902 (DarwinX8664)!
>>> ? (elt "ß" 0)
>>> #\Latin_Capital_Letter_A_With_Tilde
>>> ? (length "ß")
>>> 2
>>>
>>> Version 1.1 is advertised as unicode-native, which I would have
>>> thought would make the above return ß and 1 instead of #
>>> \Latin_Capital_Letter_A_With_Tilde and 2. Or am I missing
>> something?
>>>
>>> The reason I'm really asking, BTW, is not that I really care about
>>> being correct about the length of unicode strings, but that I want
>> to
>>> create reader macros for unicode characters (in particular « and »)
>> so
>>> the error I really care about is this one:
>>>
>>> ? #\«
>>>> Error: Unknown character name - "«" .
>>
>>
>> Specify the -K utf-8 option when you start up CCL.
>>
>> $ openmcl64 -K utf-8
>> Welcome to Clozure Common Lisp Version 1.1-r7685:7830MS
>> (DarwinX8664)!
>> ? (elt "ß" 0)
>> #\Latin_Small_Letter_Sharp_S
>> ? (length "ß")
>> 1
>> ? #\«
>> #\Left-Pointing_Double_Angle_Quotation_Mark
>> ?
>>
>> (If you now save the lisp with save-application, this encoding
>> setting
>>
>> will stick.)
>>
>> The default external-format for *terminal-io* and other streams whose
>>
>> encoding is not explicitly specified is ISO-8859-1. See ccl:release-
>>
>> notes.txt (search for ISO-8859-1) for some notes about this.
>>
>> As for the odd output you were seeing:
>>
>> UTF-8 for the ß is #xc3, #x9f (note the two octets, hence the length
>>
>> 2), and the ISO-8859-1 character for code point #xc3 is #
>> \Latin_Capital_Letter_A_With_Tilde, which explains the name. Your
>> terminal is using the UTF-8 encoding, but the lisp is treating the
>> bytes as ISO-8859-1.
>>
>> Hope this helps.
>>
>>
>>
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>>
>>
>
>
More information about the Openmcl-devel
mailing list