[Openmcl-devel] documentation syntax (was: ccl manual)
Eli Barzilay
eli at barzilay.org
Mon Dec 28 22:25:35 PST 2009
(Disclaimer: I'm the developer of the Scribble syntax reader.)
Executive summary:
* The scribble syntax is very much doing what you're talking about.
It is a readtable-based extension of the (PLT) sexpr reader, and
basically provides an alternative syntax for sexprs.
* The important point is that it is -- still -- just sexprs. This
makes it a perfect example that is a "showcase for what makes Lisp
different and unique".
* Yes, it's PLT in this case, but it is implemented as a readtable
extension -- the exact same benefits that make the scribble syntax
work out nicely would apply to an implementation in *any* Lisp. (In
fact, I believe that it would be a viable solution even for
non-sexpr languages, but non-sexpr-people usually fail to see the
benefits of a uniform syntax and a uniform syntax representation.)
* Since the syntax itself is just an alternative reader for sexprs, it
is completely decoupled from the actual documentation system. You
can use it with any functions you want -- they can emit latex code,
docbook code, plain text, or they can even be unrelated to
documentation; the syntax itself just makes it easy to write lots of
free form prose in code. (Also important: free form prose that is
rich in code.) This makes it a fitting extension even when you
already have your own system for producing documentation.
* A more detailed description of the syntax, the design decisions that
lead to it, and the benefits is in the scheme workshop paper on it:
http://barzilay.org/misc/scribble-reader.pdf
And a few replies:
Ron Garret <ron at flownet.com> writes:
>
> No, my solution is not "just as complicated." My solution has one
> very small startup cost which then pays dividends over a very long
> period of time.
I think that Brian's point is that doing a syntax for free-form text
is much more involved than just finding two unused characters. The
way Scribble started is, very roughly speaking, similar to what you
suggest -- only I kept it all in ascii world, for obvious reasons.
Using "{...}" seemed like a good choice given that most editors treat
them as balanced, and they're very rarely used in Scheme code.
Extending the reader with these is the first step, and it's
essentially the same as what you're suggesting.
But this is really the tip of an iceberg; there's a whole bunch of
features that you need to add to make it into something practical.
* You need to accommodate newlines in text. The simple thing to do is
to quote the whole text as is -- but then you end up with functions
(or macros) that strip off the indentation, and doing that is not
easy. Several Schemes have here-string syntax, but IMO it's quite
unpopular because preserving the spaces verbatim means that you
break the flow of code, making the result incomprehensible. You
also need some syntax to have a convenient way to break a source
line without breaking the text. Same as `%' in tex sources.
* As with your suggestion, the scribble reader uses balanced tokens,
which reduces the need for these -- balanced "{...}"s in text are
taken as part of the text. This means that you almost never need to
have an additional escape, but since there are still cases where you
do need to have unbalanced occurrences, you still need an escape.
This is a tricky bit, since the escape will itself need to be
escaped (which often makes things go down the usual backslash-hell
path). (And BTW, I did experiment with some non-uniform escaping
schemes, where "\" would escape only the quotation tokens, but these
always have drawbacks, most notably in the form of text that cannot
be quoted.)
* A good way to avoid the hassle of special tokens (there are three of
them now: open/close quotations, and an escape), is to have
alternative tokens. Several languages these days allow both "..."
and '...', so you can choose the form that is more convenient.
Python goes further and adds """...""" and '''...'''. But it is
much better to make the syntax extensible, in the same way that
here-strings in shell scripts are extensible (just choose some
random token that doesn't appear in the text).
* You need to have some convenient way to do string interpolation. As
a lisper, you'd quickly recognize that the more general feature you
want is some way to do unquoting. This needs to be *very*
convenient to use, since it is *crucial* for using the syntax as a
markup language. I've seen many systems that make the mistake of
specifying only open/close delimiters, thinking that this is
everything that you need to enter free-form text... but then you
end up with:
(foo «Some text » (bold «bolded text») «.»)
Since the connection to unquotes is obvious, some systems add
another token for unquoting code. This doesn't work very well --
assuming `■' you think that you could do:
(foo «Some text ■(bold «bolded text») .»)
but that raises a problem: since "«...»"s are a reader macro, there
is no way to make it produce something that will be spliced into the
surrounding form. And even if there was, it doesn't always make
sense to do this splicing, since it would break in:
(defvar «Some text ■(bold «bolded text») .»)
Yet another point that makes this bad is that reading code is not
similar to reading text -- quote/backquote work well for things like
macros, where most of the code is quoted, but when you do markup,
then each markup essentially requires jumping up to the language
level and back down to the quoted level -- these `,`,`,... patterns
can be confusing in code, but they're even worse in text where you
want to focus on what you write rather than on the markup syntax.
Even in the above toy example, did you notice the extra space before
the period? (And if you did -- would you expect most people to
catch it?)
This is a good hook to the rationale behind the scribble design.
Instead of adding quotation characters into the reader, it adds a
character that marks the beginning of a text form, and then the
textual body follows:
■foo«Some text.»
This is read as (foo "Some text.") -- the interpretation of `foo' is
left for the language, so it can be a function, a macro, or whatever.
The important point for unquotes is that the reader treats "■" the
same whether it appears in code *or* in text. For example:
■foo«Some text ■bold«bolded text».»
I'll stop here with the examples, since many more are described in
that paper. (BTW, scribble uses @foo{...}, but can easily be
customized with any (unicode) characters you want.)
> Furthermore, my solution would serve as a showcase for what makes
> Lisp different and unique.
It's worth repeating the above -- I completely agree with this. This
whole thing works out extremely well with an extensible lisp reader.
There are some things that are particular to PLT Scheme, but if I'd be
able to quantify the overall benefit as "N", then my guess is that any
lisp system would get more than 0.9*N. One such example that shows
how well it fits in is the fact that PLT uses syntax objects that
contain source location in them -- but even if you implement a similar
extension in some lisp that uses some internal features to track
source information (like a weak hash table from the sexprs to the
location), you'd still get error messages with proper locations.
(IOW, this is not unique to the scribble reader -- it's a by-product
of building it as a readtable extension.)
> And finally, we're talking about writing documentation here. If we
> can't document the process of editing keybindings [...]
(Another point for the above list: a good test for whatever syntax you
come up with is to see how well it can document itself.)
> If we're not going to take advantage of Lisp's unique features to
> solve our own problems how are we ever going to convince anyone else
> that Lisp can solve their problems? What are we even doing here?
...and if you get here, I hope that it's clear that the scribble
reader takes exactly this approach. Having it allowed the PLT project
to finally convert a huge amount of documentation from a pile of messy
latex (which wasn't even distributed, since it was too difficult for
random hackers to try to run) to a scheme based documentation system
(which is also called scribble, and that's what's described in the
link that Brian first posted).
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!
More information about the Openmcl-devel
mailing list