[Openmcl-devel] Error on macro character after sharpsign colon

Sun Jan 31 11:51:14 PST 2010

I hate to flog a dead horse, but I have some new data.

I have an experimental system called symbol-reader-macros.  This lets you defined reader macros on symbols rather than characters, which lets you do things like turn your REPL into a decent emulation of a unix shell (among other things).  It works by defining a regular reader macro on all alphabetic characters which dispatches to a function that calls READ recursively, checks to see if the resulting form is a symbol with a symbol reader macro function defined on it, and if so calls that function.  It works, and it lets you do cool things, but when it's loaded it renders CCL incapable of reading any uninterned symbols.

So my sympathies in this matter have shifted, and I now believe that Terje is right and that this is a problem that needs to be fixed.

FWIW, the counter-argument that #:#foo and #:() will return possibly unintuitive results is IMHO a weak one because these character sequences are unlikely to appear in actual code, and are certainly unlikely to appear there by accident.  Whatever the value of having these character sequences produce errors might be, I think those are vastly outweighed by the ability to define reader macros on alphabetic characters without breaking uninterned symbols.

rg

On Jan 27, 2010, at 5:18 AM, Gary Byers wrote:

> So, you think that
> 
> * '#:#abc
> 
> should quietly return an uninterned symbol whose name begins with a #
> character, and that
> 
> * '#:()
> 
> should be read as an uninterned symbol with a 0-length name, followed
> by an empty list ?  (FWIW, SBCL has these behaviors; LW accepts the
> first example and complains that the second involves a missing symbol
> name after #:, allegro seems to treat both cases as SBCL does, CLISP
> and CCL (and MCL) complain about both of these examples.  I wouldn't
> be surprised if other cases or a sampling of other implementations
> expose other differences.)
> 
> That's indeed the behavior one would get if #: went directly to step
> 8 of the reader algorithm via something like your interpretation of
> 2.4.8.5.
> 
> When you say that:
> 
> Now I think that 2.4.8.5 -- when it talks about "must have the syntax of
> a symbol" -- that this statement can only imply that the thing after #:
> is interpreted as a token as an a priori decision.
> 
> do you think that an a priori decision has also been made that the alleged symbol name also contains no package prefix ?
> 
> "Must" can be interpreted in at least 2 ways:
> 
> (1) as "it is a requirement that" ("The arguments to the function
>    + must be numbers, and it's desirable that #'+ verify that to
>    be true").  CLHS usually tries to use more precise terminology
>    to describe requirements like this.
> (2) as "it is a logical consequence of previously established facts or
>    prior knowledge" ("If the sum of integers X and Y is known to be
>    odd and X is known to be even, then Y must be odd.")
> 
> It is certainly not established that the (alleged) symbol name doesn't
> contain a package prefix; every implementation that I've looked at
> treats that as something that needs to be verified at runtime. ("must
> [1]").  I find it odd that some implementations would treat the first
> part as impling that something's been established (the the alleged
> symbol name HAS the syntax of a symbol and we should enter a state
> corresponding to step 8 of the reader algorithm based on some
> nonexistent a priori knowledge of what characters actually follow #:).
> I can't see how it's reasonable to simultaneously apply two disjoint
> interpretations to a single use of the word "must".
> 
> It follows that I don't think that we have any reason to enter anything
> other than a state corresponding to step 1 of the reader algorithm: we
> don't know anything about the syntax types of the characters that we're
> about to read and certainly haven't received special dispensation to
> start in step 8; if we exit in from a state corresponding to step 10
> (we have a token) we win and if we would exit in some other state we
> lose (the characters following #: didn't have the syntax of a symbol);
> if we won, we can check the additional requirement that the symbol name
> token not have a package prefix.
> 
> In order to believe that there's a basis for going directly to step 8
> (and treating initial non-terminating macro characters as constituents,
> among other things), I have to parse a single use of "must" as if it
> simultaneously means two different things.  Every time that I try to
> do that, I get a bad headache.  If the restriction on package prefixes
> weren't present, I think that I'd lean pretty far towards interpreting
> "must have the syntax of a symbol" as meaning "it is a requirement that ..."
> rather than "there is some unspecified basis for assuming that ..."; the
> additional package-prefix qualification  seems pretty convincing to
> me (possibly because trying to lean in two directions at the same time
> gives me a REALLY bad headache.)
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, 27 Jan 2010, Tobias C. Rittweiler wrote:
> 
>> Gary Byers <gb at clozure.com> writes:
>> 
>>> There are likely other reasons why calling the macro function either
>>> can't work or wouldn't be a good idea.  I don't think that any
>>> implementations do that or that there's any reason to think that they
>>> would.  The #: reader-macro in the implementations whose source I
>>> looked at do essentially what CCL does: collect a "token" by reading a
>>> sequence of characters from the current input stream and making an
>>> uninterned symbol out of the sequence of characters that comprise that
>>> token.  The "collect-token" process may involve calling some internal
>>> function that's also called by the reader.
>>> 
>>> There are at least a couple of approaches to this token-collection
>>> process:
>>> 
>>> a) read characters and process escape characters until a delimiter
>>>    (whitespace, terminating macro, EOF) is encountered.
>>> 
>>> b) essentially the same, but insist that the first character is
>>>    a constituent or escape character (and not a non-terminating
>>>    macro.)
>>> 
>>> Some implementations follow (a); others (including CCL) follow (b).
>>> I haven't heard any argument in favor of (a) that doesn't seem to
>>> be based on a misunderstanding of what's happening here.
>> 
>> SBCL behaves like a), and I'd defend that as the "better" choice -- of
>> course, I cannot give an official position statement, it's my personal
>> opinion:
>> 
>> 
>>  2.4.8.5 Sharpsign Colon says
>> 
>> "The symbol-name must have the syntax of a symbol with no package
>>  prefix."
>> 
>> 
>> The CLHS talks about "syntax of a symbol" only to differentiate tokens
>> between numbers, potential numbers, and symbols. See for example:
>> 
>> 
>> 2.3.1 Numbers as Tokens
>> 
>> "When a token is read, it is interpreted as a number or symbol. The
>>  token is interpreted as a number if it satisfies the syntax for
>>  numbers specified in the next figure."
>> 
>> or
>> 
>> 2.2 Reader Algorithm, 2nd §.
>> 
>> "When dealing with tokens, the reader's basic function is to
>>  distinguish representations of symbols from those of numbers. When a
>>  token is accumulated, it is assumed to represent a number if it
>>  satisfies the syntax for numbers [...]. If it does not represent a
>>  number, it is then assumed to be a potential number if it satisfies
>>  the rules governing the syntax for a potential number. If a valid
>>  token is neither a representation of a number nor a potential number,
>>  it represents a symbol."
>> 
>> 
>> Now I think that 2.4.8.5 -- when it talks about "must have the syntax of
>> a symbol" -- that this statement can only imply that the thing after #:
>> is interpreted as a token as an a priori decision.
>> 
>> This means now, that #: must perform step 8 in the Reader Algorithm
>> (2.2) which is the only place in the standard that specifies how tokens
>> are actually read.
>> 
>> Following step 8, a non-terminating macro character is just interpreted
>> as constituent.
>> 
>> I.e. taking the example in Terge's original posting, #:!foo must
>> (following my argumentation) be read as an uninterned symbol with
>> symbol-name "!FOO" (modulo readtable-case.)
>> 
>> -T.
>> 
>> 
>> 
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>> 
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel