[Openmcl-devel] documentation syntax

Tue Dec 29 12:19:13 PST 2009

I think we may be talking past each other.  Let me see if I can get us onto the same page.  Let me know if you agree with the following, and if not, where you begin to disagree:

1.  There is a distinction between S-expressions and the textual representations of S-expressions.  The READ and PRINT functions convert back and forth between S-expressions and their textual representations.

2.  The READ and PRINT functions are modifiable, so it is possible for the same S-expression to have different textual representations (and vice versa, though that would probably be bad engineering practice.  i.e. we should avoid syntactic conventions that allows the same text to map onto multiple S-expressions.)

3.  There is a canonical textual representation of S-expressions defined by the Common Lisp Standard.

4.  Another possible textual representation for S-expressions is one that follows the syntactic conventions of HTML/XML/SGML, to wit:

<tag attr=val ...> content ... </tag>    ==> ((tag attr val ...) content ...)

with the optional simplification to (tag content ...) in the no-attributes case.

5.  A third possible textual representation of S-expressions is Scribble syntax.

6.  Many other major and minor variations on the theme are possible.  The use of balanced quotes is a minor variation that can be optionally introduced as an element of many different surface syntaxes for S-expressions.

7.  The task we have before us is to design a system that allows collaborative development of documentation.

8.  There is general agreement that a Wiki is a good structure to support such development.

9.  Wikis traditionally consist of two major components: a revision control system, and a renderer that translates content from a syntax designed to be easy to edit into one that can be displayed by a browser.  Typically the editing syntax looks something like Markdown, and the rendered syntax is HTML, though there is no inherent reason why this should be so.  A Wiki could just as well render to PDF, or Microsoft Word, or ascii art.

10.  Traditionally, a given wiki is strongly bound to a particular editing syntax.  Unlike the previous point, there *is* an inherent reason why this is so: it is because the translation from editing syntax to display syntax is done in one step by a particular rendering engine that is part of the wiki, and so pages must be edited in the syntax supported by the particular rendering engine that is part of the wiki.

11.  Some wikis, Trac in particular, have rendering engines that can be extended by the user.

Are you with me so far?  If so, here is the key point:

When I say, "Let's use S-expressions" I do not mean "Let's use the canonical textual representation of S-expressions (or some variant thereof) as the editing syntax of our wiki."  What I mean is: "Let's take advantage of the distinction between S-expressions and their textual representations."  If we do that, we can use a regular Wiki for rendering and revision control, but we don't need to agree on a surface syntax.  All we need to agree on is a structure for the underlying S-expressions (and we would need to agree on something analogous to that no matter we do if we're going to produce something coherent).

Put this another way: what I'm suggesting at root is a change to the fundamental structure of a traditional wiki: make the rendering process a two-stage process instead of the one-stage process that it currently is.  Stage one is to go from (possible multiple) surface syntax to S-expressions, and stage 2 is to go from S-expressions to (again, possibly multiple) target syntax.  The primary advantages would be 1) we don't have to agree on a single editing syntax and 2) we can use the power of Lisp to process the intermediate S-expressions to, for example, do error checking, or expand macros.

All the issues of balances quotes, escaping and unescaping, and whatnot, are all issues of surface syntax design.  They are important issues, but they are dominated by the fact that if we do what I'm suggesting we don't have to reach consensus on them in order to make progress.

I could go on and make my proposal more concrete, but I think I'll stop here and make sure we don't have any major disconnects at this level before I put a lot more time into this.

rg

On Dec 29, 2009, at 2:05 AM, Eli Barzilay wrote:

> Ron Garret <ron at flownet.com> writes:
> 
>> On Dec 28, 2009, at 10:25 PM, Eli Barzilay wrote:
>> 
>>> Ron Garret <ron at flownet.com> writes:
>>>> 
>>>> No, my solution is not "just as complicated."  My solution has one
>>>> very small startup cost which then pays dividends over a very long
>>>> period of time.
>>> 
>>> I think that Brian's point is that doing a syntax for free-form
>>> text is much more involved than just finding two unused characters.
>> 
>> That's true, but it's important to keep the context in mind.  I
>> proposed balanced quotes as a solution to a very narrow problem that
>> Brian raised: he complained that he didn't like to edit sexprs
>> because you can get lost in backslash escapes.  That is a legitimate
>> concern, but it is easily remedied.  [...]
> 
> If all you care about is *only* backslashes in strings, then of course
> the problem becomes trivial.  Regardless of whether Brian's complaint
> was derived by backslashes alone, coming up with a good syntax for
> textual content is far more difficult than new string delimiters.
> 
> 
>>> The way Scribble started is, very roughly speaking, similar to what
>>> you suggest -- only I kept it all in ascii world, for obvious
>>> reasons.  Using "{...}" seemed like a good choice given that most
>>> editors treat them as balanced, and they're very rarely used in
>>> Scheme code.
>> 
>> Whether to stay in ascii is a matter of taste.  [...]
> 
> (This is irrelevant to the issue.)
> 
> 
>>> Extending the reader with these is the first step, and it's
>>> essentially the same as what you're suggesting.
>>> 
>>> But this is really the tip of an iceberg; there's a whole bunch of
>>> features that you need to add to make it into something practical.
>>> 
>>> * You need to accommodate newlines in text.  The simple thing to do
>>>  is to quote the whole text as is -- but then you end up with
>>>  functions (or macros) that strip off the indentation, and doing
>>>  that is not easy.
>> 
>> It isn't?  Why not?  Seems like a pretty elementary exercise to me.
> 
> What should these be read as:
> 
>  «foo
>  bar
>  baz»
> 
>  «foo
>     bar
>   baz»
> 
> ?  How do you deal with nested forms when you want to produce text
> with meaningful indentation?
> 
> 
>> The difficulty notwithstanding, why do you need to strip indentation
>> at all?  If it's all going to be rendered as HTML, why bother taking
>> out the indentation?  Won't the browser do that for you?
> 
> The general problem is how to allow free form text for any context,
> not just for HTML generation.  But even with HTML, whitespaces are not
> collapsed everywhere they appear.
> 
> 
>>> * As with your suggestion, the scribble reader uses balanced
>>>  tokens, which reduces the need for these -- balanced "{...}"s in
>>>  text are taken as part of the text.  This means that you almost
>>>  never need to have an additional escape, but since there are
>>>  still cases where you do need to have unbalanced occurrences, you
>>>  still need an escape.  This is a tricky bit, since the escape
>>>  will itself need to be escaped (which often makes things go down
>>>  the usual backslash-hell path).  (And BTW, I did experiment with
>>>  some non-uniform escaping schemes, where "\" would escape only
>>>  the quotation tokens, but these always have drawbacks, most
>>>  notably in the form of text that cannot be quoted.)
>> 
>> When using balanced quotes, an escape is needed only when you need
>> to insert an unbalanced literal close-quote into the text.  I'm
>> pretty sure that's going to be rare enough that balanced quotes
>> would keep us out of backslash hell.
> 
> *sigh*.
> 
> 
>>> * A good way to avoid the hassle of special tokens (there are three of
>>> them now: open/close quotations, and an escape), is to have
>>> alternative tokens.  Several languages these days allow both "..."
>>> and '...', so you can choose the form that is more convenient.
>>> Python goes further and adds """...""" and '''...'''.  But it is
>>> much better to make the syntax extensible, in the same way that
>>> here-strings in shell scripts are extensible (just choose some
>>> random token that doesn't appear in the text).
>> 
>> Python's quotes work really well until you start to quote python
>> code.  Then it gets *really* ugly.
> 
> Yes -- that's exactly why a "dynamically" extensible syntax is needed.
> (Did you look at the paper?)
> 
> 
>>> * You need to have some convenient way to do string interpolation.  As
>>> a lisper, you'd quickly recognize that the more general feature you
>>> want is some way to do unquoting.  This needs to be *very*
>>> convenient to use, since it is *crucial* for using the syntax as a
>>> markup language.  I've seen many systems that make the mistake of
>>> specifying only open/close delimiters, thinking that this is
>>> everything that you need to enter free-form text...  but then you
>>> end up with:
>>> 
>>>   (foo «Some text » (bold «bolded text») «.»)
>>> 
>>> Since the connection to unquotes is obvious, some systems add
>>> another token for unquoting code.  This doesn't work very well --
>>> assuming `■' you think that you could do:
>>> 
>>>   (foo «Some text ■(bold «bolded text») .»)
>>> 
>>> but that raises a problem: since "«...»"s are a reader macro, there
>>> is no way to make it produce something that will be spliced into the
>>> surrounding form.  And even if there was, it doesn't always make
>>> sense to do this splicing, since it would break in:
>>> 
>>>   (defvar «Some text ■(bold «bolded text») .»)
>> 
>> Huh?  I don't see the difference between this and the previous example.
> 
> If there was a way to make
> 
>  «Some text ■(bold «bolded text») .»
> 
> be read as
> 
>  "Some text " (bold "bolded text") " ."
> 
> in a way that gets spliced into the `foo' call form, then the `defvar'
> form would be meaningless:
> 
>  (defvar foo "Some text " (bold "bolded text") " .")
> 
> (I forgot the `foo' in the above.)
> 
> 
>>> Yet another point that makes this bad is that reading code is not
>>> similar to reading text -- quote/backquote work well for things like
>>> macros, where most of the code is quoted, but when you do markup,
>>> then each markup essentially requires jumping up to the language
>>> level and back down to the quoted level -- these `,`,`,... patterns
>>> can be confusing in code, but they're even worse in text where you
>>> want to focus on what you write rather than on the markup syntax.
>>> Even in the above toy example, did you notice the extra space before
>>> the period?  (And if you did -- would you expect most people to
>>> catch it?)
>>> 
>>> This is a good hook to the rationale behind the scribble design.
>>> Instead of adding quotation characters into the reader, it adds a
>>> character that marks the beginning of a text form, and then the
>>> textual body follows:
>>> 
>>>   ■foo«Some text.»
>>> 
>>> This is read as (foo "Some text.") -- the interpretation of `foo' is
>>> left for the language, so it can be a function, a macro, or whatever.
>>> The important point for unquotes is that the reader treats "■" the
>>> same whether it appears in code *or* in text.  For example:
>>> 
>>>   ■foo«Some text ■bold«bolded text».»
>> 
>> Note that this works only because FOO is assumed to take a single
>> argument,
> 
> It's not.  The above expression is read as:
> 
>  (foo "Some text " (bold "bolded text") ".")
> 
> 
>> [...]  But not all markup can be nested.  Bold tags, for example,
>> cannot be meaningfully nested.
> 
> As far as the *concrete syntax* goes, this is meaningless.  The
> concrete syntax needs to be a way to write textual content.  What
> `bold' means is completely irrelevant -- in the same way that it is
> irrelevant in the (bold 1 2 3) sexpr.
> 
> 
>> This observation suggests another possible approach: use sexprs for
>> semantically meaningful structure like chapters, sections, function
>> definitions, and so on, and use Markdown to decorate strings.
> 
> This is again limiting the scope to a very narrow domain with a few
> fixed operators that have special rules.  IMO, this (and the "rare
> enough" above) is in direct contradiction to what makes sexprs so
> great: they are completely uniform.  There's no discounts in the form
> of "well, `+' is really parsed in this different way, since it's so
> much more convenient".
> 
> 
>> I think that would be a pretty reasonable approach that would not
>> require reinventing too many wheels.
> 
> I didn't talk about any wheel that needs to be reinvented.
> 
> -- 
>          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>                    http://barzilay.org/                   Maze is Life!
> 
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel