[Openmcl-devel] Error on macro character after sharpsign colon

Mon Feb 1 12:44:13 PST 2010

On Feb 1, 2010, at 10:58 AM, Gary Byers wrote:

> So, a case is pathological and irrelevant if it doesn't involve a
> corner that someone's painted themselves into, and meaningful if it
> does ?

I'm not sure I'd go quite that far, but yes, a case matters more if it involves an actual problem that someone has actually encountered doing something that someone might actually want to do than a case that doesn't.  And we seem to agree on this:

> I agree that plausible, real-world cases should have more weight
> than ... whatever constitutes the opposite of those cases.

I think the word you're looking for is "hypothetical."

> I don't know how often (if ever) it comes up in practice, but I can
> easily imagine someone expecting
> 
> (read-from-string "#:1234")
> 
> to return an uninterned symbol assuming "standard syntax" and regardless
> of the value of *READ-BASE*, and that anyone doing that would have
> every right to scream bloody murder if it didn't work that way in a given
> implementation.  (I don't know of any implementation in which this
> wouldn't behave that way; what we obtain by interpreting the sequence of
> characters that follow the #\: as "something with the syntax of a symbol"
> isn't the same as what calling READ would return (READ's behavior would
> be sensitive to the value of *READ-BASE*, at the very least.)

I think I got lost in all the double-negatives.  In CCL the result of reading "#:1234" does depend on *print-base*.  But surely you knew that.

> Again assuming standard syntax and that *READ-EVAL* is true, it seems
> obvious that:
> 
> (read-from-string "#.(intern \"ABC\")")
> 
> will return a symbol.  It seems equally obvious that the string "#.(intern \"ABC\")" doen't have the syntax of a symbol; I hope
> that I'm correct in characterizing that as obvious.

It's not obvious to me.  I think it's perfectly defensible to interpret the phrase "the syntax of a symbol" to mean "those strings which return symbols when passed as the first argument to READ-FROM-STRING".  Mind you I'm not *advocating* this interpretation, I'm just saying it's defensible.

The phrase "the syntax of a symbol" is inherently ambiguous in a language where the syntax can be changed by the user.  And not just by munging the readtable.  *READ-BASE* also affects which character sequences are and are not symbols.  Does "123" have the syntax of a symbol?  How about "CAFEBABE"?

> That particular
> sequence of characters stopped having the syntax of a symbol as soon
> as the first character was determined to be a macro character and not
> a simple constituent.

That is a defensible position.  But it is also a defensible position that "the syntax of a symbol" in the context of reading uninterned symbols should be interpreted with respect to the standard readtable.

> The spec says that the symbol-name that follows #: must have the
> syntax of a symbol; that's not the same as saying that it is any
> sequence which would cause READ to return a symbol.  (Fortunately, all
> existing implementations agree on this and no implementations process
> those initial macro characters; less fortunately, some implementations
> will incorporate initial macro characters into the token they collect
> and other implementations consider the presence of macro characters
> in that context to be a violation of a "must have the syntax of a symbol"
> requirement.

Right.  Some interpretations interpret "the syntax of a symbol" to be with respect to the current readtable, and others interpret it with respect to the standard readtable.  Both are defensible positions.

> 
>> Whatever the value of having these character sequences produce
>> errors might be, I think those are vastly outweighed by the ability
>> to define reader macros on alphabetic characters without breaking
>> uninterned symbols.
> 
> I think that you're greatly underestimating the value of tractable,
> well-defined behavior.

Not at all.  All I'm saying is that when faced with two defensible ways to interpret an ambiguous requirement, one should chose the one that produces the more useful results.

> In the US postal system, states are denoted by a particular 2-character
> abbreviation: Alaska is denoted by AK, New York by NY, etc.  A hypothetical,
> hopefully plausible program that processed these codes might define a
> reader macro to make them easier to recognize and validate.
> 
> (set-macro-character #\$ (lambda (stream char)
>                           (declare (ignore char))
>                           (let* ((name (make-string 2)))
>                             (setf (schar name 0) (char-upcase (read-char stream))
>                                   (schar name 1) (char-upcase (read-char stream)))
>                             (or (find-symbol name "STATE-NAMES")
>                                 (error ...)))))
> 
> So, $AK reads as STATE-NAMES:ALASKA. and both
> 
> $AK something-else
> 
> and
> 
> $AKsomething-else
> 
> are equivalent: the state name code is exactly 2 characters long and not delimited
> by whitespace; whatever "something else" is, it's incredibly critical and can
> be completely ignored in other cases.

What would $ALASKA read as?

> I don't know how contrived this example is,

It's pretty contrived.

> but I have a lot of difficulty
> concluding that there's no value in signaling an error.

If there's any value in having an error signaled BY THE READER I don't see it.  You still need downstream error checking to make sure someone hasn't typed in FOO as a state name, and that check will also catch #:$AK as an invalid state name.  Relying on the reader to verify semantics seems like a bad mistake.

rg