[Openmcl-devel] Error on macro character after sharpsign colon

Terje Norderhaug terje at in-progress.com
Tue Jan 26 17:42:21 PST 2010


On Jan 26, 2010, at 1:13 PM, Gary Byers wrote:
> On Tue, 26 Jan 2010, Gary Byers wrote:
>
> Suppose that someone decided (for barely plausible reasons) that #\;
> shouldn't be a macro character and should just be a ordinary  
> constituent
> character.
>
>  (set-macro-character #\; nil nil)
>  (set-syntax-from-char #\; #\a) ; just in case.
>
> Now, ';abc is just a symbol with a "non-standard" name.
>
> Should '#:;abc be rejected, because the first character of its name
> is a macro character in a (standard) readtable that isn't otherwise
> in effect ?

A reasonable expectation is that the character after the sharpsign  
colon is valid if and only if it can be read as the start of an  
unqualified symbol using the Current Readtable. However, this turns  
out to be difficult to ascertain.

As Gary have demonstrated, reading uninterned symbols using the  
syntax of the Standard Readtable may violate the expectation if the  
Current Readtable disables macro characters.

Yet in today's Clozure implementation, the expectation can be  
violated by defining a character macro that returns a symbol. Hence  
it is no resolution to reject reading an uninterned symbol that  
starts with a character in the Current Readtable.

> Signaling an error on invalid syntax discourages the use of invalid  
> syntax;

The question is what constitutes invalid syntax in the printed  
representation of uninterned symbols. The spec seems ambiguous when  
it comes to the "syntax of a symbol" for the characters following the  
sharpsign colon. Some options:

[A] Characters that can be read as an unqualified symbol using the  
Current Readtable.

+ Matches developer expectations with symbol names that are the same  
as can be read using the Current Readtable.
- Impractical to implement (reading the characters may have the side  
effect of interning a symbol).

[B] Characters that can be read as an unqualified symbol using the  
Standard Readtable (i.e. is consistent with the Standard Syntax for  
symbols).

+ Predictable, well defined in specification, implementable.
- Symbol names may differ from those that can be read using the  
Current Readtable.

[C] Token that is neither a number nor has a package marker, and  
doesn't start with a character in the Current Readtable.

+ Predictable, well defined, implementable.
- May reject some symbol names that can be read using the Current  
Readtable.

[D] Token that is neither a number nor has a package marker.

+ Predictable, well defined in specification, implementable.
- May allow some symbol names that cannot be read using the Current  
Readtable.

See CLHS 2.3.4 Symbols as Tokens Valid Patterns for Tokens:
http://www.lispworks.com/documentation/lw50/CLHS/Body/02_cd.htm

Also CLHS 2.3.5 Valid Patterns for Tokens
http://www.lispworks.com/documentation/lw50/CLHS/Body/02_ce.htm

I have suggested that a continuable error on [C] can be restarted  
with [D].

> I don't understand how it discourages careful use of custom reader  
> macros (or even fairly casual use of them.)

I can only speak for myself on what I take from learning that an  
uninterned symbol cannot be read if its first character is the same  
as a custom reader macro character. It makes me more reluctant to use  
reader macros.

> Use of custom reader macros -does- change the syntax of the language.
> In a fresh lisp:
>
> (defun factorial (n) (if (zerop n) 1 (* n (factorial (1- n)))))
>
> (defun ! (x) (factorial x))
>
> (defun !bar (x) (bar (factorial x)))
>
>
> #\! is a constituent character, no different from (for instance) a
> standard alphabetic character.
>
> If we decide that it'd be really neat if #\! was a macro character:
>
> (set-macro-character #\!
>   (lambda (stream subchar)
>     (declare (ignore subchar))
>     (factorial (read stream)))
>    t)
>
> ? !10
> 3628800
>
>
> then we've decided that it's better (more convenient, whatever) to
> have #\! be a macro-character than to have it continue to be a
> constituent; we've (slightly) changed the syntax of the language
> that the reader accepts: we've added the ability to use ! at read
> time to introduce "factorial constants", and we've lost the ability
> to use #\! as the first character of a symbol name:
>
> ? !10
> 3628800  ; Neat!
> ? (defun !foo (x) x)
>> Error: value FOO is not of the expected type NUMBER. ; Less neat!
>
> This tradeoff is something that has to be kept in mind when defining
> reader macros; it isn't some obscure thing that only affects #:.
>
>
>
>
>
>
>
>> On Tue, 26 Jan 2010, Terje Norderhaug wrote:
>>
>>> Regarding CLZ not allowing a user defined macro character after the
>>> #: of an uninterned symbol:
>>>
>>> On Jan 25, 2010, at 1:56 PM, Gary Byers wrote:
>>>> The spec says:
>>>>
>>>> "A non-terminating macro character is treated as a constituent when
>>>> it appears in the middle of an extended token being accumulated."
>>>>
>>>> (An initial macro character would cause the function associated  
>>>> with
>>>> that character to be called in the standard reader algorithm
>>>> describted
>>>> in section 2.2; a token that begins with an unescaped macro  
>>>> character
>>>> would not have "the syntax of a symbol.")
>>>>
>>>> The spec says that the <<symbol-name>> following #:"must have the
>>>> syntax of a symbol"; I think that I'd rather have the reader  
>>>> complain
>>>> when this is violated than quietly accept invalid/undefined syntax.
>>>
>>> Wouldn't it be reasonable to understand "syntax of a symbol" in the
>>> spec to refer to the Standard Syntax rather than the user extended
>>> syntax? If so, the reader could complain when the symbol name
>>> following #: starts with a character in the Standard Readtable yet
>>> allow custom macro characters in the Current Readtable.
>>>
>>> Not allowing a user defined macro character after the #: of an
>>> uninterned symbol discourages even careful use of custom reader
>>> macros. An alternative resolution could be to signal a *continuable*
>>> error when encountering an uninterned symbol that starts with a  
>>> macro
>>> character, giving the developer the final say.
>>>
>>>> On Mon, 25 Jan 2010, Terje Norderhaug wrote:
>>>>
>>>>> CLZ fails to read an uninterned symbol if the sharpsign colon is
>>>>> followed by a macro character. Is this a bug or correct  
>>>>> behavior? I
>>>>> got no wiser by reading the hyperspec, but presume it's in there
>>>>> somewhere:
>>>>>
>>>>>    http://www.lispworks.com/documentation/lw50/CLHS/Body/02_b.htm
>>>>>
>>>>> Replicate by evaluating the following:
>>>>>
>>>>> (defun symbol-reader (stream char)
>>>>>    (declare (ignore char))
>>>>>    (read stream t nil t))
>>>>>
>>>>> (set-macro-character #\! #'symbol-reader T)
>>>>>
>>>>> '!abc
>>>>> => ABC
>>>>>
>>>>> '#:!abc
>>>>> => Reader error: Illegal symbol syntax.
>>>>>
>>>>> Backtrace leads to #'read-symbol-token in the l1-reader.
>>>>>
>>>>> LispWorks (5.0 Personal) reads '#:!abc without reporting an error.
>>>>> Which implementation is right?
>>>>>
>>>>> -- Terje Norderhaug
>>>>>   terje at in-progress.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Openmcl-devel mailing list
>>>>> Openmcl-devel at clozure.com
>>>>> http://clozure.com/mailman/listinfo/openmcl-devel
>>>>>
>>>>>
>>>
>>> -- Terje Norderhaug
>>>   terje at in-progress.com
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Openmcl-devel mailing list
>>> Openmcl-devel at clozure.com
>>> http://clozure.com/mailman/listinfo/openmcl-devel
>>>
>>>
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>>
>>

-- Terje Norderhaug
   terje at in-progress.com







More information about the Openmcl-devel mailing list