[Openmcl-devel] *default-external-format* and encoding and decoding strings.
gb at clozure.com
Mon Mar 7 19:16:49 PST 2016
On 03/07/2016 04:34 PM, Ron Garret wrote:
> On Mar 7, 2016, at 3:23 PM, Gary Byers <gb at clozure.com> wrote:
>> On 03/07/2016 03:36 PM, Ron Garret wrote:
>>> On Mar 7, 2016, at 1:30 PM, R. Matthew Emerson <rme at clozure.com> wrote:
>>>>> On Mar 5, 2016, at 12:03 PM, Ron Garret <ron at flownet.com> wrote:
>>>>> On Mar 5, 2016, at 8:39 AM, Dmitry Igrishin <dfigrish at gmail.com> wrote:
>>>>>> 2016-03-05 16:25 GMT+03:00 Dmitry Igrishin <dfigrish at gmail.com>:
>>>>>> The *default-external-parameter* doesn't considered by
>>>>>> count-characters-in-octet-vector, decode-string-from-octets,
>>>>>> encode-string-to-octets, string-size-in-octets functions which
>>>>>> has the :external-format parameter. I believe that
>>>>>> *default-external-parameter* should affect the behaviour of
>>>>>> all functions with :external-format parameter, right?
>>>>>> Sorry, I meant the *default-external-format* special variable...
>>>>> This is a bug in ccl::lookup-character-encoding. Here’s a patch:
>>>>> (in-package :ccl)
>>>>> (let ((ccl::*warn-if-redefine-kernel* nil))
>>>>> (defun lookup-character-encoding (name)
>>>>> (gethash (or name *default-external-format*) *character-encodings*)))
>>>> I don't think I can apply this.
>>>> The issue is that nil is a valid character encoding name: it's a documented synonym for :iso-8859-1.
>>> That’s a bug in the documentation. NIL should be a synonym for *default-character-encoding*. (I’m not joking. That part of the docs was written before the introduction of *default-character-encoding*.)
>> when that part (what part is it, by the way?) was written, :iso-8859-1 was the default character encoding.
> Yes, I know. That’s why I thought it was wrong, because the introduction of *default-external-format* had rendered it obsolete. But then I read it more closely and remembered my history...
>> Changing the functions in question to use (e.g.)
>> (defun encode-string-to-octets (string &key (external-format :default) ....) ...
>> seems to be another approach
> Yes, just more work.
Unless I'm missing something, we are talking about 4 functions here -
COUNT-OCTETS-IN-STRING, and COUNT-CHARACTERS-IN-OCTET-VECTOR.
Someone who reads the documentation and says (DECODE-STRING-FROM-OCTETS
... :EXTERNAL-FORMAT NIL) seems
to be asking for a string produced from octets that are encoded in
iso-8859-1, whether or not NIL should be an alias
for that encoding. Given the documented treatment of NIL as an alias
here, it is not clear why these 4 functions
should treat an unsupplied :external-format argument differently than
LOAD or OPEN do.
Treating NIL and :DEFAULT as being equivalent isn't totally
unreasonable, but it would involve a seemingly greater change
to a documented interface. As far as I know, the default value of the
:external-format argument to those 4 functions
isn't documented.. The existing code for those 4 functions already
handles the case where the argument is passed
as (or defaults to) :default.
>> I suspect (but don't claim to know) that most uses of things like ENCODE-STRING-TO-OCTETS involve an explicit
>> :EXTERNAL-FORMAT argument.
> As one who believes that UTF-8 is the One True Encoding, I have:
> (setf *default-external-format* :utf-8)
> in my ccl-init file, and I never pass an explicit argument to ENCODE-STRING-TO-OCTETS. I consider anything that doesn’t work under those circumstances to be a bug. But I also don’t really care that much because it’s so easy to just patch or wrap things that don’t work the way I want them to. So personally I think it’s not entirely unreasonable to just leave things the way they are. It depends on how much value you want to put on adhering to the principle of least surprise for new users (though with CL that ship did sail a long, long time ago).
More information about the Openmcl-devel