[Openmcl-devel] documentation syntax

Eli Barzilay eli at barzilay.org
Tue Dec 29 02:05:09 PST 2009


Ron Garret <ron at flownet.com> writes:

> On Dec 28, 2009, at 10:25 PM, Eli Barzilay wrote:
>
>> Ron Garret <ron at flownet.com> writes:
>>> 
>>> No, my solution is not "just as complicated."  My solution has one
>>> very small startup cost which then pays dividends over a very long
>>> period of time.
>> 
>> I think that Brian's point is that doing a syntax for free-form
>> text is much more involved than just finding two unused characters.
>
> That's true, but it's important to keep the context in mind.  I
> proposed balanced quotes as a solution to a very narrow problem that
> Brian raised: he complained that he didn't like to edit sexprs
> because you can get lost in backslash escapes.  That is a legitimate
> concern, but it is easily remedied.  [...]

If all you care about is *only* backslashes in strings, then of course
the problem becomes trivial.  Regardless of whether Brian's complaint
was derived by backslashes alone, coming up with a good syntax for
textual content is far more difficult than new string delimiters.


>> The way Scribble started is, very roughly speaking, similar to what
>> you suggest -- only I kept it all in ascii world, for obvious
>> reasons.  Using "{...}" seemed like a good choice given that most
>> editors treat them as balanced, and they're very rarely used in
>> Scheme code.
>
> Whether to stay in ascii is a matter of taste.  [...]

(This is irrelevant to the issue.)


>> Extending the reader with these is the first step, and it's
>> essentially the same as what you're suggesting.
>> 
>> But this is really the tip of an iceberg; there's a whole bunch of
>> features that you need to add to make it into something practical.
>> 
>> * You need to accommodate newlines in text.  The simple thing to do
>>   is to quote the whole text as is -- but then you end up with
>>   functions (or macros) that strip off the indentation, and doing
>>   that is not easy.
>
> It isn't?  Why not?  Seems like a pretty elementary exercise to me.

What should these be read as:

  «foo
  bar
  baz»

  «foo
     bar
   baz»

?  How do you deal with nested forms when you want to produce text
with meaningful indentation?


> The difficulty notwithstanding, why do you need to strip indentation
> at all?  If it's all going to be rendered as HTML, why bother taking
> out the indentation?  Won't the browser do that for you?

The general problem is how to allow free form text for any context,
not just for HTML generation.  But even with HTML, whitespaces are not
collapsed everywhere they appear.


>> * As with your suggestion, the scribble reader uses balanced
>>   tokens, which reduces the need for these -- balanced "{...}"s in
>>   text are taken as part of the text.  This means that you almost
>>   never need to have an additional escape, but since there are
>>   still cases where you do need to have unbalanced occurrences, you
>>   still need an escape.  This is a tricky bit, since the escape
>>   will itself need to be escaped (which often makes things go down
>>   the usual backslash-hell path).  (And BTW, I did experiment with
>>   some non-uniform escaping schemes, where "\" would escape only
>>   the quotation tokens, but these always have drawbacks, most
>>   notably in the form of text that cannot be quoted.)
>
> When using balanced quotes, an escape is needed only when you need
> to insert an unbalanced literal close-quote into the text.  I'm
> pretty sure that's going to be rare enough that balanced quotes
> would keep us out of backslash hell.

*sigh*.


>> * A good way to avoid the hassle of special tokens (there are three of
>>  them now: open/close quotations, and an escape), is to have
>>  alternative tokens.  Several languages these days allow both "..."
>>  and '...', so you can choose the form that is more convenient.
>>  Python goes further and adds """...""" and '''...'''.  But it is
>>  much better to make the syntax extensible, in the same way that
>>  here-strings in shell scripts are extensible (just choose some
>>  random token that doesn't appear in the text).
>
> Python's quotes work really well until you start to quote python
> code.  Then it gets *really* ugly.

Yes -- that's exactly why a "dynamically" extensible syntax is needed.
(Did you look at the paper?)


>> * You need to have some convenient way to do string interpolation.  As
>>  a lisper, you'd quickly recognize that the more general feature you
>>  want is some way to do unquoting.  This needs to be *very*
>>  convenient to use, since it is *crucial* for using the syntax as a
>>  markup language.  I've seen many systems that make the mistake of
>>  specifying only open/close delimiters, thinking that this is
>>  everything that you need to enter free-form text...  but then you
>>  end up with:
>> 
>>    (foo «Some text » (bold «bolded text») «.»)
>> 
>>  Since the connection to unquotes is obvious, some systems add
>>  another token for unquoting code.  This doesn't work very well --
>>  assuming `■' you think that you could do:
>> 
>>    (foo «Some text ■(bold «bolded text») .»)
>> 
>>  but that raises a problem: since "«...»"s are a reader macro, there
>>  is no way to make it produce something that will be spliced into the
>>  surrounding form.  And even if there was, it doesn't always make
>>  sense to do this splicing, since it would break in:
>> 
>>    (defvar «Some text ■(bold «bolded text») .»)
>
> Huh?  I don't see the difference between this and the previous example.

If there was a way to make

  «Some text ■(bold «bolded text») .»

be read as

  "Some text " (bold "bolded text") " ."

in a way that gets spliced into the `foo' call form, then the `defvar'
form would be meaningless:

  (defvar foo "Some text " (bold "bolded text") " .")

(I forgot the `foo' in the above.)


>>  Yet another point that makes this bad is that reading code is not
>>  similar to reading text -- quote/backquote work well for things like
>>  macros, where most of the code is quoted, but when you do markup,
>>  then each markup essentially requires jumping up to the language
>>  level and back down to the quoted level -- these `,`,`,... patterns
>>  can be confusing in code, but they're even worse in text where you
>>  want to focus on what you write rather than on the markup syntax.
>>  Even in the above toy example, did you notice the extra space before
>>  the period?  (And if you did -- would you expect most people to
>>  catch it?)
>> 
>> This is a good hook to the rationale behind the scribble design.
>> Instead of adding quotation characters into the reader, it adds a
>> character that marks the beginning of a text form, and then the
>> textual body follows:
>> 
>>    ■foo«Some text.»
>> 
>> This is read as (foo "Some text.") -- the interpretation of `foo' is
>> left for the language, so it can be a function, a macro, or whatever.
>> The important point for unquotes is that the reader treats "■" the
>> same whether it appears in code *or* in text.  For example:
>> 
>>    ■foo«Some text ■bold«bolded text».»
>
> Note that this works only because FOO is assumed to take a single
> argument,

It's not.  The above expression is read as:

  (foo "Some text " (bold "bolded text") ".")


> [...]  But not all markup can be nested.  Bold tags, for example,
> cannot be meaningfully nested.

As far as the *concrete syntax* goes, this is meaningless.  The
concrete syntax needs to be a way to write textual content.  What
`bold' means is completely irrelevant -- in the same way that it is
irrelevant in the (bold 1 2 3) sexpr.


> This observation suggests another possible approach: use sexprs for
> semantically meaningful structure like chapters, sections, function
> definitions, and so on, and use Markdown to decorate strings.

This is again limiting the scope to a very narrow domain with a few
fixed operators that have special rules.  IMO, this (and the "rare
enough" above) is in direct contradiction to what makes sexprs so
great: they are completely uniform.  There's no discounts in the form
of "well, `+' is really parsed in this different way, since it's so
much more convenient".


> I think that would be a pretty reasonable approach that would not
> require reinventing too many wheels.

I didn't talk about any wheel that needs to be reinvented.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!




More information about the Openmcl-devel mailing list