[Openmcl-devel] documentation syntax (was: ccl manual)

Tue Dec 29 01:18:39 PST 2009

On Dec 28, 2009, at 10:25 PM, Eli Barzilay wrote:

> Scribble

One of the nice things about agreeing to use Sexprs is that we *don't* have to agree on a surface syntax.  As long as we agree on the structure of the sexprs, you can edit them using your favorite surface syntax and I can use mine.  They don't have to be the same.

> Ron Garret <ron at flownet.com> writes:
>> 
>> No, my solution is not "just as complicated."  My solution has one
>> very small startup cost which then pays dividends over a very long
>> period of time.
> 
> I think that Brian's point is that doing a syntax for free-form text
> is much more involved than just finding two unused characters.

That's true, but it's important to keep the context in mind.  I proposed balanced quotes as a solution to a very narrow problem that Brian raised: he complained that he didn't like to edit sexprs because you can get lost in backslash escapes.  That is a legitimate concern, but it is easily remedied.  And the beauty of having underlying sexprs is, again, that we don't need to agree on surface syntax.

> The way Scribble started is, very roughly speaking, similar to what you
> suggest -- only I kept it all in ascii world, for obvious reasons.
> Using "{...}" seemed like a good choice given that most editors treat
> them as balanced, and they're very rarely used in Scheme code.

Whether to stay in ascii is a matter of taste.  I personally like using unicode because there's just too much contention for punctuation in ascii land, and IMHO unbalanced quotes are the root of much obfuscation.  YMMV as they say.  My purpose in suggesting balanced quotes was not so much to try to get everyone to use them (though I do think the world would be a better place if everyone did) but merely to point out that if backslash escapes bother you there are alternatives available.

> Extending the reader with these is the first step, and it's
> essentially the same as what you're suggesting.
> 
> But this is really the tip of an iceberg; there's a whole bunch of
> features that you need to add to make it into something practical.
> 
> * You need to accommodate newlines in text.  The simple thing to do is
>  to quote the whole text as is -- but then you end up with functions
>  (or macros) that strip off the indentation, and doing that is not
>  easy.

It isn't?  Why not?  Seems like a pretty elementary exercise to me.

The difficulty notwithstanding, why do you need to strip indentation at all?  If it's all going to be rendered as HTML, why bother taking out the indentation?  Won't the browser do that for you?

>  Several Schemes have here-string syntax, but IMO it's quite
>  unpopular because preserving the spaces verbatim means that you
>  break the flow of code, making the result incomprehensible.  You
>  also need some syntax to have a convenient way to break a source
>  line without breaking the text.  Same as `%' in tex sources.

Again, if the ultimate target rendering is HTML, why does any of this matter?

> * As with your suggestion, the scribble reader uses balanced tokens,
>  which reduces the need for these -- balanced "{...}"s in text are
>  taken as part of the text.  This means that you almost never need to
>  have an additional escape, but since there are still cases where you
>  do need to have unbalanced occurrences, you still need an escape.
>  This is a tricky bit, since the escape will itself need to be
>  escaped (which often makes things go down the usual backslash-hell
>  path).  (And BTW, I did experiment with some non-uniform escaping
>  schemes, where "\" would escape only the quotation tokens, but these
>  always have drawbacks, most notably in the form of text that cannot
>  be quoted.)

When using balanced quotes, an escape is needed only when you need to insert an unbalanced literal close-quote into the text.  I'm pretty sure that's going to be rare enough that balanced quotes would keep us out of backslash hell.

> * A good way to avoid the hassle of special tokens (there are three of
>  them now: open/close quotations, and an escape), is to have
>  alternative tokens.  Several languages these days allow both "..."
>  and '...', so you can choose the form that is more convenient.
>  Python goes further and adds """...""" and '''...'''.  But it is
>  much better to make the syntax extensible, in the same way that
>  here-strings in shell scripts are extensible (just choose some
>  random token that doesn't appear in the text).

Python's quotes work really well until you start to quote python code.  Then it gets *really* ugly.

Unicode provides a large repertoire of balanced quotes.  To large, in fact, because a lot of the glyphs are easily confused.  I think the best choices are «...» and “...” because they are visually distinctive.

> * You need to have some convenient way to do string interpolation.  As
>  a lisper, you'd quickly recognize that the more general feature you
>  want is some way to do unquoting.  This needs to be *very*
>  convenient to use, since it is *crucial* for using the syntax as a
>  markup language.  I've seen many systems that make the mistake of
>  specifying only open/close delimiters, thinking that this is
>  everything that you need to enter free-form text...  but then you
>  end up with:
> 
>    (foo «Some text » (bold «bolded text») «.»)
> 
>  Since the connection to unquotes is obvious, some systems add
>  another token for unquoting code.  This doesn't work very well --
>  assuming `■' you think that you could do:
> 
>    (foo «Some text ■(bold «bolded text») .»)
> 
>  but that raises a problem: since "«...»"s are a reader macro, there
>  is no way to make it produce something that will be spliced into the
>  surrounding form.  And even if there was, it doesn't always make
>  sense to do this splicing, since it would break in:
> 
>    (defvar «Some text ■(bold «bolded text») .»)

Huh?  I don't see the difference between this and the previous example.

>  Yet another point that makes this bad is that reading code is not
>  similar to reading text -- quote/backquote work well for things like
>  macros, where most of the code is quoted, but when you do markup,
>  then each markup essentially requires jumping up to the language
>  level and back down to the quoted level -- these `,`,`,... patterns
>  can be confusing in code, but they're even worse in text where you
>  want to focus on what you write rather than on the markup syntax.
>  Even in the above toy example, did you notice the extra space before
>  the period?  (And if you did -- would you expect most people to
>  catch it?)
> 
> This is a good hook to the rationale behind the scribble design.
> Instead of adding quotation characters into the reader, it adds a
> character that marks the beginning of a text form, and then the
> textual body follows:
> 
>    ■foo«Some text.»
> 
> This is read as (foo "Some text.") -- the interpretation of `foo' is
> left for the language, so it can be a function, a macro, or whatever.
> The important point for unquotes is that the reader treats "■" the
> same whether it appears in code *or* in text.  For example:
> 
>    ■foo«Some text ■bold«bolded text».»

Note that this works only because FOO is assumed to take a single argument, so it lexical (in the sense of the parser now, not in the usual sense) scope can be determined by the lexical scope of the next form.  If FOO took an arbitrary number of arguments, this approach would break down.  And it's only necessary because (presumably) FOO can be nested.  But not all markup can be nested.  Bold tags, for example, cannot be meaningfully nested.

This observation suggests another possible approach: use sexprs for semantically meaningful structure like chapters, sections, function definitions, and so on, and use Markdown to decorate strings.  The above example would then become:

(foo «Some text *some bolded text*»)

I think that would be a pretty reasonable approach that would not require reinventing too many wheels.

rg