[Openmcl-devel] documentation syntax (was: ccl manual)

Mon Dec 28 22:25:35 PST 2009

(Disclaimer: I'm the developer of the Scribble syntax reader.)

Executive summary:

* The scribble syntax is very much doing what you're talking about.
  It is a readtable-based extension of the (PLT) sexpr reader, and
  basically provides an alternative syntax for sexprs.

* The important point is that it is -- still -- just sexprs.  This
  makes it a perfect example that is a "showcase for what makes Lisp
  different and unique".

* Yes, it's PLT in this case, but it is implemented as a readtable
  extension -- the exact same benefits that make the scribble syntax
  work out nicely would apply to an implementation in *any* Lisp.  (In
  fact, I believe that it would be a viable solution even for
  non-sexpr languages, but non-sexpr-people usually fail to see the
  benefits of a uniform syntax and a uniform syntax representation.)

* Since the syntax itself is just an alternative reader for sexprs, it
  is completely decoupled from the actual documentation system.  You
  can use it with any functions you want -- they can emit latex code,
  docbook code, plain text, or they can even be unrelated to
  documentation; the syntax itself just makes it easy to write lots of
  free form prose in code.  (Also important: free form prose that is
  rich in code.)  This makes it a fitting extension even when you
  already have your own system for producing documentation.

* A more detailed description of the syntax, the design decisions that
  lead to it, and the benefits is in the scheme workshop paper on it:
  http://barzilay.org/misc/scribble-reader.pdf

And a few replies:

Ron Garret <ron at flownet.com> writes:
>
> No, my solution is not "just as complicated."  My solution has one
> very small startup cost which then pays dividends over a very long
> period of time.

I think that Brian's point is that doing a syntax for free-form text
is much more involved than just finding two unused characters.  The
way Scribble started is, very roughly speaking, similar to what you
suggest -- only I kept it all in ascii world, for obvious reasons.
Using "{...}" seemed like a good choice given that most editors treat
them as balanced, and they're very rarely used in Scheme code.
Extending the reader with these is the first step, and it's
essentially the same as what you're suggesting.

But this is really the tip of an iceberg; there's a whole bunch of
features that you need to add to make it into something practical.

* You need to accommodate newlines in text.  The simple thing to do is
  to quote the whole text as is -- but then you end up with functions
  (or macros) that strip off the indentation, and doing that is not
  easy.  Several Schemes have here-string syntax, but IMO it's quite
  unpopular because preserving the spaces verbatim means that you
  break the flow of code, making the result incomprehensible.  You
  also need some syntax to have a convenient way to break a source
  line without breaking the text.  Same as `%' in tex sources.

* As with your suggestion, the scribble reader uses balanced tokens,
  which reduces the need for these -- balanced "{...}"s in text are
  taken as part of the text.  This means that you almost never need to
  have an additional escape, but since there are still cases where you
  do need to have unbalanced occurrences, you still need an escape.
  This is a tricky bit, since the escape will itself need to be
  escaped (which often makes things go down the usual backslash-hell
  path).  (And BTW, I did experiment with some non-uniform escaping
  schemes, where "\" would escape only the quotation tokens, but these
  always have drawbacks, most notably in the form of text that cannot
  be quoted.)

* A good way to avoid the hassle of special tokens (there are three of
  them now: open/close quotations, and an escape), is to have
  alternative tokens.  Several languages these days allow both "..."
  and '...', so you can choose the form that is more convenient.
  Python goes further and adds """...""" and '''...'''.  But it is
  much better to make the syntax extensible, in the same way that
  here-strings in shell scripts are extensible (just choose some
  random token that doesn't appear in the text).

* You need to have some convenient way to do string interpolation.  As
  a lisper, you'd quickly recognize that the more general feature you
  want is some way to do unquoting.  This needs to be *very*
  convenient to use, since it is *crucial* for using the syntax as a
  markup language.  I've seen many systems that make the mistake of
  specifying only open/close delimiters, thinking that this is
  everything that you need to enter free-form text...  but then you
  end up with:

    (foo «Some text » (bold «bolded text») «.»)

  Since the connection to unquotes is obvious, some systems add
  another token for unquoting code.  This doesn't work very well --
  assuming `■' you think that you could do:

    (foo «Some text ■(bold «bolded text») .»)

  but that raises a problem: since "«...»"s are a reader macro, there
  is no way to make it produce something that will be spliced into the
  surrounding form.  And even if there was, it doesn't always make
  sense to do this splicing, since it would break in:

    (defvar «Some text ■(bold «bolded text») .»)

  Yet another point that makes this bad is that reading code is not
  similar to reading text -- quote/backquote work well for things like
  macros, where most of the code is quoted, but when you do markup,
  then each markup essentially requires jumping up to the language
  level and back down to the quoted level -- these `,`,`,... patterns
  can be confusing in code, but they're even worse in text where you
  want to focus on what you write rather than on the markup syntax.
  Even in the above toy example, did you notice the extra space before
  the period?  (And if you did -- would you expect most people to
  catch it?)

This is a good hook to the rationale behind the scribble design.
Instead of adding quotation characters into the reader, it adds a
character that marks the beginning of a text form, and then the
textual body follows:

    ■foo«Some text.»

This is read as (foo "Some text.") -- the interpretation of `foo' is
left for the language, so it can be a function, a macro, or whatever.
The important point for unquotes is that the reader treats "■" the
same whether it appears in code *or* in text.  For example:

    ■foo«Some text ■bold«bolded text».»

I'll stop here with the examples, since many more are described in
that paper.  (BTW, scribble uses @foo{...}, but can easily be
customized with any (unicode) characters you want.)

> Furthermore, my solution would serve as a showcase for what makes
> Lisp different and unique.

It's worth repeating the above -- I completely agree with this.  This
whole thing works out extremely well with an extensible lisp reader.
There are some things that are particular to PLT Scheme, but if I'd be
able to quantify the overall benefit as "N", then my guess is that any
lisp system would get more than 0.9*N.  One such example that shows
how well it fits in is the fact that PLT uses syntax objects that
contain source location in them -- but even if you implement a similar
extension in some lisp that uses some internal features to track
source information (like a weak hash table from the sexprs to the
location), you'd still get error messages with proper locations.
(IOW, this is not unique to the scribble reader -- it's a by-product
of building it as a readtable extension.)

> And finally, we're talking about writing documentation here.  If we
> can't document the process of editing keybindings [...]

(Another point for the above list: a good test for whatever syntax you
come up with is to see how well it can document itself.)

> If we're not going to take advantage of Lisp's unique features to
> solve our own problems how are we ever going to convince anyone else
> that Lisp can solve their problems?  What are we even doing here?

...and if you get here, I hope that it's clear that the scribble
reader takes exactly this approach.  Having it allowed the PLT project
to finally convert a huge amount of documentation from a pile of messy
latex (which wasn't even distributed, since it was too difficult for
random hackers to try to run) to a scheme based documentation system
(which is also called scribble, and that's what's described in the
link that Brian first posted).

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!