[Openmcl-devel] *default-character-encoding* should be :utf-8

Ron Garret ron at flownet.com
Sun Sep 23 14:12:39 PDT 2012


As the instigator of this thread, I think it's worth recapping the original argument, which had nothing to do with moral failings and everything to do with real-world considerations.

The sad fact of the matter is that the Internet is lousy with texts that cannot be encoded in Latin-1.  Some people (notably those whose native language is not English) even write code that contain characters that cannot be encoded in latin-1 (the nerve!)  There are three -- and only three -- ways to deal with this situation:

1.  Use Latin-1 exclusively, and lock yourself out of being able to deal with code and texts the contain non-European glyphs.
2.  Use Latin-1 and some other encoding(s), and deal with the confusion that inevitably results.
3.  Use an encoding that covers all (or at least most) of the unicode code point space.

I advocate #3 in general, and UTF-8 in particular, because I don't like provincialism and I don't like unnecessary complication.  But this is a value judgement, and reasonable people can disagree.  UTF8 is no panacea.  There are drawbacks, most notably that ELT is no longer O(1), and the length of a string is not a linear function of its size in memory.

There is one aspect of Latin-1 that I find particularly annoying in the context of choosing an encoding for Lisp code, and that is that the encoding of lower-case lambda (λ) is incompatibly different between Latin-1 and UTF-8.  Since it is no longer 1978, I sometimes like to spell lambda as "λ".  Because I use the λ character, and because I don't want to close the door on non-European texts, and because I don't like unnecessary complication, I choose to use UTF8 exclusively, and I think the world would be a better place if everyone did likewise.

Again, reasonable people can disagree, and clearly the fate of civilization does not hinge on this decision.  But if CCL is going to revert to latin-1 I would hope it would not be because the argument for UTF8 had been misunderstood.

rg


On Sep 23, 2012, at 12:49 PM, Gary Byers wrote:

> The values of *TERMINAL-CHARACTER-ENCODING-NAME* and *DEFAULT-FILE-CHARACTER-ENCODING*
> changed (experimentally) in the trunk in r15236, largely as an attempt to silence
> an apparently endless discussion started in:
> 
> <http://clozure.com/pipermail/openmcl-devel/2012-March/013401.html>
> 
> Both of those variables have historically been initialized to NIL (which
> is equivalent to :ISO-8859-1.)
> 
> A careful reading of that thread will reveal that if you have files that
> aren't encoded in :UTF-8 that's because of sloth, avarice, or some other
> personal failing on your part (and would have nothing to do with real-world
> issues.)
> 
> That change was intentionally not incorporated into 1.8; all other things
> being equal, I think that I'd prefer that 1.9 revert back to using NIL/:ISO-8859-1
> (but that might cause that discussion to start up again.)
> 
> In the bigger picture: as I understand how things currently stand, the trunk
> contains workarounds for a couple of Mountain Lion issues (the mechanism used
> by things like GUI:EXECUTE-IN-GUI and the problems with #_mach_port_allocate_name)
> that have not yet been propagated to 1.8.  Assuming that the fixes/workarounds
> are correct and have been smoke-tested to at least some degree, the changes
> do need to be incorporated into 1.8 ASAP, and once they are I would think that
> you'd want to base your application on 1.8.
> 
> I'm a little nervous about the hashing scheme that's being used to avoid
> #_mach_port_allocate_name (I don't know how well it scales and don't know
> whether or not there are non-obvious race conditions or other thread-safety
> issues in the code) and I'd ordinarily be a little reluctant to push something
> like that to the release: the consequences of using #_mach_port_allocate_name
> are so horrible on Mountain Lion that the hashing scheme is clearly better (even
> if it contains its own obscure/subtle problems); on OS releases where
> #_mach_port_allocate_name still works, then propagating the change simply risks
> introducing some obscure/subtle problems.
> 
> I'll try to decide what to do soon, but I don't think that it'd be a good idea
> for you (or anyone) to base a shipping application on the CCL trunk, simply
> because the trunk's volatility would make it harder for you or us to maintain.
> 
> 
> On Sat, 22 Sep 2012, Alexander Repenning wrote:
> 
>> tracked down some bugs that are due to the new :utf-8 encoding. When actually did that happen? Anyway, is there a way to set the encoding back (:ascii)? The only *default-character-encoding* variable I can find is part of quicklisp/babel/encodings.
>> 
>> Alex
>> 
>> 
>> 
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>> 
>> 
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel




More information about the Openmcl-devel mailing list