[Openmcl-devel] A plug for UTF-8

Ron Garret ron at flownet.com
Thu Sep 10 16:36:07 UTC 2009

I'm not advocating any change in CCL, I'm just urging people to as a  
matter of common practice set their default encodings to UTF-8 and  
publish their code using UTF-8.  That's all.

The reason for this (and for nearly everything I'm advocating  
nowadays) is that I want to make CL in general and CCL in particular  
as attractive as possible to new users.  I believe one way to do this  
is insure that to the maximum extent possible things "just work".   
Nowadays, a big part of "just working" is to minimize the amount of  
mental energy users have to spend fiddling with unicode encodings.   
Unfortunately, the unicode standard is b0rken so it is not possible to  
reduce this fiddling to zero, but until unicode is fixed I think just  
having everyone use UTF-8 by convention is the next best thing.  The  
situation with unicode today is analogous to that which plagued IBM PC  
add-on cards before plug-and-play came along.  Users had to manually  
fiddle with various hardware configurations.  Some day the unicode  
community will fix the mess they've created and come up with a  
standard way to embed the encoding in the byte stream.  But until that  
happens the best we can do is just all follow some convention.  And  
the simplest convention is to just pick an encoding and stick with it.


On Sep 10, 2009, at 9:08 AM, Daniel Weinreb wrote:

> Ron,
> I'm not sure I understand what you are advocating.
> What change would you like to see in CCL?
> -- Dan
> Ron Garret wrote:
>> I would like to take a moment to lobby on behalf of UTF-8.  This  
>> is  not a huge big deal because it's easy enough to convert from  
>> one  encoding to another once you know how, but I think it would be  
>> a nice  selling point for CCL is things tended to Just Work, and  
>> one way to  make them Just Work is to have an encoding convention  
>> that is  universally followed so that newcomers can set it and  
>> forget it.  The  reason I think UTF-8 is a better choice than, say,  
>> Latin-1 is that  UTF-8 gives you access to the entire unicode code  
>> space, and in  particular the lower-case Greek lambda character (λ)  
>> and European- style «quotation marks» which are self-balancing and  
>> hence let you  build nested strings without the need for backslash  
>> escapes.
>> Thank you for your indulgence during this commercial break.  You  
>> may  now return to your regularly scheduled programming.
>> rg
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel

More information about the Openmcl-devel mailing list