[Openmcl-devel] *default-character-encoding* should be :utf-8

Raymond Wiker rwiker at gmail.com
Sun Mar 4 10:55:43 PST 2012


On Mar 4, 2012, at 17:23 , Ron Garret wrote:
> 
> On Mar 4, 2012, at 3:38 AM, Pascal J. Bourguignon wrote:
> 
>> Tim Bradshaw <tfb at tfeb.org> writes:
>> 
>>> On 4 Mar 2012, at 07:23, Pascal J. Bourguignon wrote:
>>>> 
>>>> On POSIX systems, I would like the defaults (*default-external-format*
>>>> *default-file-character-encoding* *default-socket-character-encoding*)
>>>> to be what is specified by the environment variables LC_ALL, or else
>>>> LC_CTYPE, or else LANG, or else LANGUAGE (the latest being a GNU
>>>> extension, I wonder why).  See http://clisp.org/impnotes/clisp.html
>>>> (search: environment variables).
>>> 
>>> Do you understand how the encoding should depend on the locale
>>> variables, as I've never been able to work that out (and the CLISP
>>> documentation doesn't say how they work it out).  This isn't a
>>> rhetorical question: I'd like to know.  Feel free to mail me privately
>>> if you have any pointers, as this might be off-topic for the list.
>> 
>> LC_ALL=C               ==> :us-ascii
>> LC_ALL=en_EN.UTF-8     ==> :utf-8
>> LC_ALL=fr_FR.ISO8859-1 ==> :iso-8859-1
>> LC_ALL=gr_GR.ISO8859-7 ==> :iso-8859-7
>> 
>> and so on.
> 
> But (IMHO) code (Lisp or otherwise) should ALWAYS be in UTF-8 no matter what (and IMHO, CCL ought to ship with UTF-8 as the default encodings).  There are two reasons for this.  First, code consist primarily of code points (no pun intended) <128, and UTF-8 is the most compact encoding for that distribution that can still represent all code points, and second, code often has to be dealt with by beginners who already have enough on their plate without having to worry about getting their encoding settings correct.

Another point here is that the encoding of a particular file is what it is, no matter what the user's environment has been set up for. It is no help if the user's environment has been set up for a particular encoding when at least some source files are in a completely different encoding.

I think some Lisp systems recognize a hint about encoding specified as an emacs-like mode line; that would help a little. Another option would be to use the BOM mechanism to indicate when files are in UTF-8 and UTF-16. Finally, it should be possible to extend ASDF to pass appropriate :external-format settings to load-file and compile-file.


More information about the Openmcl-devel mailing list