[Openmcl-devel] *default-character-encoding* should be :utf-8
gb at clozure.com
Sun Sep 23 22:53:54 UTC 2012
One good suggestion that Robert Goldman made (and that everyone - including me -
ignored) in the discussion last spring is to have LOAD and COMPILE-FILE (at least)
honor a coding: attribute [*] in the file attributes line (aka the modeline). E.g.:
;;; -*- Mode: lisp; Coding: utf-8 -*-
at the top of a .lisp source file makes it pretty clear that the file's author
intends for the file to be processed in utf-8 and makes that fact obvious to
a human reader as well.
Emacs (generally) supports this; other environments (the Cocoa IDE)
could be made to if they don't already, and LOAD and COMPILE-FILE
could do so in CCL (and may already do so in other implementations) at
least when their :EXTERNAL-FORMAT argument isn't explicitly specifed.
(OPEN could also do so, but might not find an attribute line as often.)
Things like *DEFAULT-FILE-CHARACTER-ENCODING* would still have to exist
and we could continue to argue about what value it should take, but following
Robert's suggestion would mean that that wouldn't matter as often.
[*] IIRC. The point here is to use whatever attribute name Emacs uses.
On Sun, 23 Sep 2012, Ron Garret wrote:
> As the instigator of this thread, I think it's worth recapping the original argument, which had nothing to do with moral failings and everything to do with real-world considerations.
> The sad fact of the matter is that the Internet is lousy with texts that cannot be encoded in Latin-1. Some people (notably those whose native language is not English) even write code that contain characters that cannot be encoded in latin-1 (the nerve!) There are three -- and only three -- ways to deal with this situation:
> 1. Use Latin-1 exclusively, and lock yourself out of being able to deal with code and texts the contain non-European glyphs.
> 2. Use Latin-1 and some other encoding(s), and deal with the confusion that inevitably results.
> 3. Use an encoding that covers all (or at least most) of the unicode code point space.
> I advocate #3 in general, and UTF-8 in particular, because I don't like provincialism and I don't like unnecessary complication. But this is a value judgement, and reasonable people can disagree. UTF8 is no panacea. There are drawbacks, most notably that ELT is no longer O(1), and the length of a string is not a linear function of its size in memory.
> There is one aspect of Latin-1 that I find particularly annoying in the context of choosing an encoding for Lisp code, and that is that the encoding of lower-case lambda (?) is incompatibly different between Latin-1 and UTF-8. Since it is no longer 1978, I sometimes like to spell lambda as "?". Because I use the ? character, and because I don't want to close the door on non-European texts, and because I don't like unnecessary complication, I choose to use UTF8 exclusively, and I think the world would be a better place if everyone did likewise.
> Again, reasonable people can disagree, and clearly the fate of civilization does not hinge on this decision. But if CCL is going to revert to latin-1 I would hope it would not be because the argument for UTF8 had been misunderstood.
> On Sep 23, 2012, at 12:49 PM, Gary Byers wrote:
>> The values of *TERMINAL-CHARACTER-ENCODING-NAME* and *DEFAULT-FILE-CHARACTER-ENCODING*
>> changed (experimentally) in the trunk in r15236, largely as an attempt to silence
>> an apparently endless discussion started in:
>> Both of those variables have historically been initialized to NIL (which
>> is equivalent to :ISO-8859-1.)
>> A careful reading of that thread will reveal that if you have files that
>> aren't encoded in :UTF-8 that's because of sloth, avarice, or some other
>> personal failing on your part (and would have nothing to do with real-world
>> That change was intentionally not incorporated into 1.8; all other things
>> being equal, I think that I'd prefer that 1.9 revert back to using NIL/:ISO-8859-1
>> (but that might cause that discussion to start up again.)
>> In the bigger picture: as I understand how things currently stand, the trunk
>> contains workarounds for a couple of Mountain Lion issues (the mechanism used
>> by things like GUI:EXECUTE-IN-GUI and the problems with #_mach_port_allocate_name)
>> that have not yet been propagated to 1.8. Assuming that the fixes/workarounds
>> are correct and have been smoke-tested to at least some degree, the changes
>> do need to be incorporated into 1.8 ASAP, and once they are I would think that
>> you'd want to base your application on 1.8.
>> I'm a little nervous about the hashing scheme that's being used to avoid
>> #_mach_port_allocate_name (I don't know how well it scales and don't know
>> whether or not there are non-obvious race conditions or other thread-safety
>> issues in the code) and I'd ordinarily be a little reluctant to push something
>> like that to the release: the consequences of using #_mach_port_allocate_name
>> are so horrible on Mountain Lion that the hashing scheme is clearly better (even
>> if it contains its own obscure/subtle problems); on OS releases where
>> #_mach_port_allocate_name still works, then propagating the change simply risks
>> introducing some obscure/subtle problems.
>> I'll try to decide what to do soon, but I don't think that it'd be a good idea
>> for you (or anyone) to base a shipping application on the CCL trunk, simply
>> because the trunk's volatility would make it harder for you or us to maintain.
>> On Sat, 22 Sep 2012, Alexander Repenning wrote:
>>> tracked down some bugs that are due to the new :utf-8 encoding. When actually did that happen? Anyway, is there a way to set the encoding back (:ascii)? The only *default-character-encoding* variable I can find is part of quicklisp/babel/encodings.
>>> Openmcl-devel mailing list
>>> Openmcl-devel at clozure.com
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
More information about the Openmcl-devel