[Openmcl-devel] default-character-encoding should be :utf-8

Sun Mar 4 12:04:44 PST 2012

I'm going to put in a double vote for using UTF-8 as the default file
encoding. I'm setting up a new Debian server right now that knows
about locale en_US.utf8 but not en_CA.utf8 (what my local system's
locale is and what gets sent in the SSH session), and trying to load
Hunchentoot with CLISP fails. If I didn't know about the CLISP default
encoding and how to generate locale files in Debian, this would be a
very frustrating problem. But if the file encoding defaulted to utf8
instead of ascii or latin-1, things would have "just worked."

Vladimir

On Sun, Mar 4, 2012 at 1:55 PM, Raymond Wiker <rwiker at gmail.com> wrote:
> On Mar 4, 2012, at 17:23 , Ron Garret wrote:
>>
>> On Mar 4, 2012, at 3:38 AM, Pascal J. Bourguignon wrote:
>>
>>> Tim Bradshaw <tfb at tfeb.org> writes:
>>>
>>>> On 4 Mar 2012, at 07:23, Pascal J. Bourguignon wrote:
>>>>>
>>>>> On POSIX systems, I would like the defaults (*default-external-format*
>>>>> *default-file-character-encoding* *default-socket-character-encoding*)
>>>>> to be what is specified by the environment variables LC_ALL, or else
>>>>> LC_CTYPE, or else LANG, or else LANGUAGE (the latest being a GNU
>>>>> extension, I wonder why).  See http://clisp.org/impnotes/clisp.html
>>>>> (search: environment variables).
>>>>
>>>> Do you understand how the encoding should depend on the locale
>>>> variables, as I've never been able to work that out (and the CLISP
>>>> documentation doesn't say how they work it out).  This isn't a
>>>> rhetorical question: I'd like to know.  Feel free to mail me privately
>>>> if you have any pointers, as this might be off-topic for the list.
>>>
>>> LC_ALL=C               ==> :us-ascii
>>> LC_ALL=en_EN.UTF-8     ==> :utf-8
>>> LC_ALL=fr_FR.ISO8859-1 ==> :iso-8859-1
>>> LC_ALL=gr_GR.ISO8859-7 ==> :iso-8859-7
>>>
>>> and so on.
>>
>> But (IMHO) code (Lisp or otherwise) should ALWAYS be in UTF-8 no matter what (and IMHO, CCL ought to ship with UTF-8 as the default encodings).  There are two reasons for this.  First, code consist primarily of code points (no pun intended) <128, and UTF-8 is the most compact encoding for that distribution that can still represent all code points, and second, code often has to be dealt with by beginners who already have enough on their plate without having to worry about getting their encoding settings correct.
>
> Another point here is that the encoding of a particular file is what it is, no matter what the user's environment has been set up for. It is no help if the user's environment has been set up for a particular encoding when at least some source files are in a completely different encoding.
>
> I think some Lisp systems recognize a hint about encoding specified as an emacs-like mode line; that would help a little. Another option would be to use the BOM mechanism to indicate when files are in UTF-8 and UTF-16. Finally, it should be possible to extend ASDF to pass appropriate :external-format settings to load-file and compile-file.
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel

[Openmcl-devel] *default-character-encoding* should be :utf-8

[Openmcl-devel] default-character-encoding should be :utf-8