[Openmcl-devel] Error on macro character after sharpsign colon

Wed Jan 27 05:18:51 PST 2010

So, you think that

* '#:#abc

should quietly return an uninterned symbol whose name begins with a #
character, and that

* '#:()

should be read as an uninterned symbol with a 0-length name, followed
by an empty list ?  (FWIW, SBCL has these behaviors; LW accepts the
first example and complains that the second involves a missing symbol
name after #:, allegro seems to treat both cases as SBCL does, CLISP
and CCL (and MCL) complain about both of these examples.  I wouldn't
be surprised if other cases or a sampling of other implementations
expose other differences.)

That's indeed the behavior one would get if #: went directly to step
8 of the reader algorithm via something like your interpretation of
2.4.8.5.

When you say that:

  Now I think that 2.4.8.5 -- when it talks about "must have the syntax of
  a symbol" -- that this statement can only imply that the thing after #:
  is interpreted as a token as an a priori decision.

do you think that an a priori decision has also been made that the 
alleged symbol name also contains no package prefix ?

"Must" can be interpreted in at least 2 ways:

(1) as "it is a requirement that" ("The arguments to the function
     + must be numbers, and it's desirable that #'+ verify that to
     be true").  CLHS usually tries to use more precise terminology
     to describe requirements like this.
(2) as "it is a logical consequence of previously established facts or
     prior knowledge" ("If the sum of integers X and Y is known to be
     odd and X is known to be even, then Y must be odd.")

It is certainly not established that the (alleged) symbol name doesn't
contain a package prefix; every implementation that I've looked at
treats that as something that needs to be verified at runtime. ("must
[1]").  I find it odd that some implementations would treat the first
part as impling that something's been established (the the alleged
symbol name HAS the syntax of a symbol and we should enter a state
corresponding to step 8 of the reader algorithm based on some
nonexistent a priori knowledge of what characters actually follow #:).
I can't see how it's reasonable to simultaneously apply two disjoint
interpretations to a single use of the word "must".

It follows that I don't think that we have any reason to enter anything
other than a state corresponding to step 1 of the reader algorithm: we
don't know anything about the syntax types of the characters that we're
about to read and certainly haven't received special dispensation to
start in step 8; if we exit in from a state corresponding to step 10
(we have a token) we win and if we would exit in some other state we
lose (the characters following #: didn't have the syntax of a symbol);
if we won, we can check the additional requirement that the symbol name
token not have a package prefix.

In order to believe that there's a basis for going directly to step 8
(and treating initial non-terminating macro characters as constituents,
among other things), I have to parse a single use of "must" as if it
simultaneously means two different things.  Every time that I try to
do that, I get a bad headache.  If the restriction on package prefixes
weren't present, I think that I'd lean pretty far towards interpreting
"must have the syntax of a symbol" as meaning "it is a requirement that ..."
rather than "there is some unspecified basis for assuming that ..."; the
additional package-prefix qualification  seems pretty convincing to
me (possibly because trying to lean in two directions at the same time
gives me a REALLY bad headache.)

On Wed, 27 Jan 2010, Tobias C. Rittweiler wrote:

> Gary Byers <gb at clozure.com> writes:
>
>> There are likely other reasons why calling the macro function either
>> can't work or wouldn't be a good idea.  I don't think that any
>> implementations do that or that there's any reason to think that they
>> would.  The #: reader-macro in the implementations whose source I
>> looked at do essentially what CCL does: collect a "token" by reading a
>> sequence of characters from the current input stream and making an
>> uninterned symbol out of the sequence of characters that comprise that
>> token.  The "collect-token" process may involve calling some internal
>> function that's also called by the reader.
>>
>> There are at least a couple of approaches to this token-collection
>> process:
>>
>> a) read characters and process escape characters until a delimiter
>>     (whitespace, terminating macro, EOF) is encountered.
>>
>> b) essentially the same, but insist that the first character is
>>     a constituent or escape character (and not a non-terminating
>>     macro.)
>>
>> Some implementations follow (a); others (including CCL) follow (b).
>> I haven't heard any argument in favor of (a) that doesn't seem to
>> be based on a misunderstanding of what's happening here.
>
> SBCL behaves like a), and I'd defend that as the "better" choice -- of
> course, I cannot give an official position statement, it's my personal
> opinion:
>
>
>   2.4.8.5 Sharpsign Colon says
>
>  "The symbol-name must have the syntax of a symbol with no package
>   prefix."
>
>
> The CLHS talks about "syntax of a symbol" only to differentiate tokens
> between numbers, potential numbers, and symbols. See for example:
>
>
>  2.3.1 Numbers as Tokens
>
>  "When a token is read, it is interpreted as a number or symbol. The
>   token is interpreted as a number if it satisfies the syntax for
>   numbers specified in the next figure."
>
> or
>
>  2.2 Reader Algorithm, 2nd §.
>
>  "When dealing with tokens, the reader's basic function is to
>   distinguish representations of symbols from those of numbers. When a
>   token is accumulated, it is assumed to represent a number if it
>   satisfies the syntax for numbers [...]. If it does not represent a
>   number, it is then assumed to be a potential number if it satisfies
>   the rules governing the syntax for a potential number. If a valid
>   token is neither a representation of a number nor a potential number,
>   it represents a symbol."
>
>
> Now I think that 2.4.8.5 -- when it talks about "must have the syntax of
> a symbol" -- that this statement can only imply that the thing after #:
> is interpreted as a token as an a priori decision.
>
> This means now, that #: must perform step 8 in the Reader Algorithm
> (2.2) which is the only place in the standard that specifies how tokens
> are actually read.
>
> Following step 8, a non-terminating macro character is just interpreted
> as constituent.
>
> I.e. taking the example in Terge's original posting, #:!foo must
> (following my argumentation) be read as an uninterned symbol with
> symbol-name "!FOO" (modulo readtable-case.)
>
>  -T.
>
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>