[Openmcl-devel] default-character-encoding should be :utf-8

Sun Sep 23 15:53:54 PDT 2012

One good suggestion that Robert Goldman made (and that everyone - including me -
ignored) in the discussion last spring is to have LOAD and COMPILE-FILE (at least)
honor a coding: attribute [*] in the file attributes line (aka the modeline).  E.g.:

;;; -*- Mode: lisp; Coding: utf-8 -*-

at the top of a .lisp source file makes it pretty clear that the file's author
intends for the file to be processed in utf-8 and makes that fact obvious to
a human reader as well.

Emacs (generally) supports this; other environments (the Cocoa IDE)
could be made to if they don't already, and LOAD and COMPILE-FILE
could do so in CCL (and may already do so in other implementations) at
least when their :EXTERNAL-FORMAT argument isn't explicitly specifed.
(OPEN could also do so, but might not find an attribute line as often.)

Things like *DEFAULT-FILE-CHARACTER-ENCODING* would still have to exist
and we could continue to argue about what value it should take, but following
Robert's suggestion would mean that that wouldn't matter as often.

---
[*] IIRC.  The point here is to use whatever attribute name Emacs uses.

On Sun, 23 Sep 2012, Ron Garret wrote:

> As the instigator of this thread, I think it's worth recapping the original argument, which had nothing to do with moral failings and everything to do with real-world considerations.
>
> The sad fact of the matter is that the Internet is lousy with texts that cannot be encoded in Latin-1.  Some people (notably those whose native language is not English) even write code that contain characters that cannot be encoded in latin-1 (the nerve!)  There are three -- and only three -- ways to deal with this situation:
>
> 1.  Use Latin-1 exclusively, and lock yourself out of being able to deal with code and texts the contain non-European glyphs.
> 2.  Use Latin-1 and some other encoding(s), and deal with the confusion that inevitably results.
> 3.  Use an encoding that covers all (or at least most) of the unicode code point space.
>
> I advocate #3 in general, and UTF-8 in particular, because I don't like provincialism and I don't like unnecessary complication.  But this is a value judgement, and reasonable people can disagree.  UTF8 is no panacea.  There are drawbacks, most notably that ELT is no longer O(1), and the length of a string is not a linear function of its size in memory.
>
> There is one aspect of Latin-1 that I find particularly annoying in the context of choosing an encoding for Lisp code, and that is that the encoding of lower-case lambda (?) is incompatibly different between Latin-1 and UTF-8.  Since it is no longer 1978, I sometimes like to spell lambda as "?".  Because I use the ? character, and because I don't want to close the door on non-European texts, and because I don't like unnecessary complication, I choose to use UTF8 exclusively, and I think the world would be a better place if everyone did likewise.
>
> Again, reasonable people can disagree, and clearly the fate of civilization does not hinge on this decision.  But if CCL is going to revert to latin-1 I would hope it would not be because the argument for UTF8 had been misunderstood.
>
> rg
>
>
> On Sep 23, 2012, at 12:49 PM, Gary Byers wrote:
>
>> The values of *TERMINAL-CHARACTER-ENCODING-NAME* and *DEFAULT-FILE-CHARACTER-ENCODING*
>> changed (experimentally) in the trunk in r15236, largely as an attempt to silence
>> an apparently endless discussion started in:
>>
>> <http://clozure.com/pipermail/openmcl-devel/2012-March/013401.html>
>>
>> Both of those variables have historically been initialized to NIL (which
>> is equivalent to :ISO-8859-1.)
>>
>> A careful reading of that thread will reveal that if you have files that
>> aren't encoded in :UTF-8 that's because of sloth, avarice, or some other
>> personal failing on your part (and would have nothing to do with real-world
>> issues.)
>>
>> That change was intentionally not incorporated into 1.8; all other things
>> being equal, I think that I'd prefer that 1.9 revert back to using NIL/:ISO-8859-1
>> (but that might cause that discussion to start up again.)
>>
>> In the bigger picture: as I understand how things currently stand, the trunk
>> contains workarounds for a couple of Mountain Lion issues (the mechanism used
>> by things like GUI:EXECUTE-IN-GUI and the problems with #_mach_port_allocate_name)
>> that have not yet been propagated to 1.8.  Assuming that the fixes/workarounds
>> are correct and have been smoke-tested to at least some degree, the changes
>> do need to be incorporated into 1.8 ASAP, and once they are I would think that
>> you'd want to base your application on 1.8.
>>
>> I'm a little nervous about the hashing scheme that's being used to avoid
>> #_mach_port_allocate_name (I don't know how well it scales and don't know
>> whether or not there are non-obvious race conditions or other thread-safety
>> issues in the code) and I'd ordinarily be a little reluctant to push something
>> like that to the release: the consequences of using #_mach_port_allocate_name
>> are so horrible on Mountain Lion that the hashing scheme is clearly better (even
>> if it contains its own obscure/subtle problems); on OS releases where
>> #_mach_port_allocate_name still works, then propagating the change simply risks
>> introducing some obscure/subtle problems.
>>
>> I'll try to decide what to do soon, but I don't think that it'd be a good idea
>> for you (or anyone) to base a shipping application on the CCL trunk, simply
>> because the trunk's volatility would make it harder for you or us to maintain.
>>
>>
>> On Sat, 22 Sep 2012, Alexander Repenning wrote:
>>
>>> tracked down some bugs that are due to the new :utf-8 encoding. When actually did that happen? Anyway, is there a way to set the encoding back (:ascii)? The only *default-character-encoding* variable I can find is part of quicklisp/babel/encodings.
>>>
>>> Alex
>>>
>>>
>>>
>>> _______________________________________________
>>> Openmcl-devel mailing list
>>> Openmcl-devel at clozure.com
>>> http://clozure.com/mailman/listinfo/openmcl-devel
>>>
>>>
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>
>

[Openmcl-devel] *default-character-encoding* should be :utf-8

[Openmcl-devel] default-character-encoding should be :utf-8