David A. Wheeler's Blog

Thu, 11 Oct 2007

Readable s-expressions (sweet-expressions) for Lisp-like languages

Back in 2006 I posted my basic ideas about “sweet-expressions”. Here’s a basic recap, before I discuss what’s new. Lisp-based programming languages normally represent programs as s-expressions, where an operation and its parameters are surrounded by parentheses. The operation to be performed is identified first, and each parameter afterwards is separated by whitespace. So the traditional “2+3” is written as “(+ 2 3)” instead. This is regular, but most people find this hard to read. Here’s a longer example of an s-expression - notice the many parentheses and the lack of infix operations:

 (defun factorial (n)
   (if (<= n 1)
       1
       (* n (factorial (- n 1)))))

Lisp-based systems are very good at symbol manipulation tasks, including program analysis. But many software developers avoid Lisp-based languages, even in cases where they would be a good tool to use, because most software developers find s-expressions really hard to read. I think I’ve found a better solution, which I call “sweet-expressions”. Here’s that same program be written using sweet-expressions:

 defun factorial (n)         ; Parameters can be indented, but need not be
   if (n <= 1)               ; Supports infix, prefix, & function <=(n 1)
       1                     ; This has no parameters, so it's an atom.
       n * factorial(n - 1)  ; Function(...) notation supported

Sweet-expressions add the following abilities:

Indentation. Indentation may be used instead of parentheses to start and end expressions: any indented line is a parameter of its parent, later terms on a line are parameters of the first term, lists of lists are marked with GROUP, and a function call with 0 parameters is surrounded or followed by a pair of parentheses [e.g., (pi) and pi()]. A “(” disables indentation until its matching “)”. Blank lines at the beginning of a new expression are ignored. A term that begins at the left edge and is immediately followed by newline is immediately executed, to make interactive use pleasant.
Name-ending. Terms of the form ‘NAME(x y…)’, with no whitespace before ‘(’, are interpreted as ‘(NAME x y…)’;. Parameters are space-separated inside. If its content is an infix expression, it’s considered one parameter instead (so f(2 + 3) computes 2 + 3 and passes its result, 5, to f).
Infix. Optionally, expressions are automatically interpreted as infix if their second parameter is an infix operator (by matching an “infix operator” pattern of symbols), the first parameter is not an infix operator, and it has at least three parameters. Otherwise, expressions are interpreted as normal “function first” prefix notation. To disable infix interpretation, surround the second parameter with as(…). Infix expressions must have an odd number of parameters with the even ones being the same binary infix operator. You must separate each infix operator with whitespace on both sides. Precedence is not supported; just use parens (a lot more about that in a moment). Use the “name-ending” form for unary operations, e.g., -(x) for “negate x”. Thus “2 + (y * -(x))” is a valid expression, equivalent to (+ 2 (* y (- x))). “(2 + 3 + 4)” is fine too. Infix operators must match this pattern (and in Scheme cannot be =>):
```
    [+-\*/<>=&\|\p{Sm}]{1-4}|\:
```

For more information on sweet-expressions or on making s-expressions more readable in general, see my website page at http://www.dwheeler.com/readable. For example, I provide a demo sweet-expression reader in Scheme (under the MIT license), as well as an indenting pretty-printer in Common Lisp. In particular, you can see my lengthy paper about why sweet-expressions do what they do, and some plausible alternatives. You can also download some other implementation code. I’ve also set up a SourceForge project named “readable” to discuss options in making s-expressions more readable, and to distribute open source software to implement them (unimplemented ideas don’t go far!).

Okay, but all of that was true in 2006 - what’s new? What’s new is a change of heart about precedence. I’ve been occasionally trying to figure out how to “flesh out” sweet-expressions with operator precedence, and I just kept hitting one annoying complication after another. Precedence is nearly universal among programming languages; they’re very useful, and only a few infix-supporting languages (such as Smalltalk) lack them. “Everyone” knows that 2+3*4 is 14, not 20, because of years of training in math classes that you multiply before you add. They’re also pretty easy to code (it’s an old solved problem). But I’ve discovered that in the typical use cases of a Lisp-like language’s expression reader, supporting precedence (in the general case) has some significant downsides that are irrelevant in other situations. Which is interesting, given how widespread they are elsewhere, so let’s see why that is.

First, let’s talk about a big advantage to not supporting precedence in sweet-expressions: It makes the creation of every new list obvious in the text. That’s very valuable in a list processing language; the key advantage of list processing languages is that you can process programs like data, and data like programs, in a very fluid way, so having clear markers of new lists using parentheses and indentation is very valuable.

Now let me note the downsides to supporting precedence in the specific cases of a Lisp-like language, which leads me to believe that it’s a bad idea for this particular use. Basically, adding precedence rules to a general-purpose list expression processor creates a slippery slope of complexity. There are two basic approaches to defining precedence: dynamic and static.

Dynamic: You can trivially create commands that dynamically create new infix operators, as well as express their precedence and direction. Coding this is not a problem, and in fact there are several implementations of this in Lisp already out there. But using this kind of dynamism for s-expressions themselves gets hideously complicated in many uses of a Lisp-like language. In a Lisp-like language, objects often shift between being data and being code. Often you’re working at multiple levels and meta-levels of input - and you may not want their precedences to be the same. Yet trying to handle those different levels turns out to require very complex management of the reader routine… you have to keep passing to the various readers the various incompatible precedence states (ugh). Combining code from different sources can be a nightmare if they have different precedence values, too. In short, having precedences dynamically vary by meta-level creates complexities you just don’t want to deal with.
Static: All of the above suggests that dynamic control is painful; one solution would be a completely static, pre-canned set of precedence levels. That would eliminate passing state, at least. And indeed, this is what most programming languages do; you don’t get to reset the precedence of C operators (a few languages, like Prolog, are an exception here). Fine, but someone has to predefine that set. That turns out to be really hard; the meanings of various infix operators are not cast in stone, so it’s difficult to preset the precedence levels meaningfully (since you don’t know what the operators mean). That’s particularly true if, like me, you’re trying to define a single expression syntax that would work (more or less) on arbitrary Lisp-like languages. The one obvious case is that * and / have higher precedence than binary + and - basically everywhere. That particular rule could be built-in, but if that’s all you build in, there’s little evidence that it’s worth the trouble. This kind of odd exception (there’s no precedence, except for * and / over + and -) is unlikely to help development, and it’s yet another rule to remember. And if we’re going to have such a limited support for precedence anyway, why not completely forbid mixing the operators - the advantage there is that it would make the mapping between lists and their textual representation much clearer.

It’s easier to add precedence later, if that turns out to be important after more experimentation. But after the experimentation I’ve done so far, it appears that precedence simply isn’t worth it in this case. Precedence creates complexity in this case, and it hides where the lists begin/end. It’s not hard to work without it; you can even argue that (2 + (5 * 6)) is actually clearer than (2 + 5 * 6). Precedence is great in many circumstances - I’d hate to lose it in other languages - but in this particular set of use cases, it seems to be more of a hurt than a help.

Of course, you can write code in some Lisp dialect to implement a language that includes precedence. Many programs written in Lisp, including PVS and Maxima, do just that. But when you’re implementing another language, you know what the operators are, and you’re probably implementing other syntactic sugar too, so adding precedence is a non-problem. Also, if you’re really happy with s-expressions as they are, and just want precedence in a few places in your code, a simple macro to implement them (such as infpre) works very well. But sweet-expressions are intended to be a universal representation in Lisp-like languages, just like S-expressions are, so their role is different. In that different role, precedence causes problems that don’t show up in most other uses. It think not supporting precedence turns out to be much better for this different role.

Here are some more examples, this time in Scheme (another Lisp dialect):

Sweet-expression (Ugly) S-expression

define factorial(n) if (n <= 1) 1 n * factorial(n - 1) substring("Hello" (1 + 1) string-length("Hello")) define move-n-turn(angle) tortoise-move(100) tortoise-turn(angle) if (0 <= 5 <= 10) display("True\n") display("Uh oh\n") define int-products(x y) if (x = y) x x * int-products((x + 1) y) int-products(3 5) (2 + 3 + (4 * 5) + 7.1) (2 + 3 + (4 / (5 * 6))) *(2 3 4 5)

(define (factorial n) (if (<= n 1) 1 (* n (factorial (- n 1))))) (substring "Hello" (+ 1 1) (string-length "Hello")) (define (move-n-turn angle) (tortoise-move 100) (tortoise-turn angle)) (if (<= 0 5 10) (display "True\n") (display "Uh oh\n")) (define (int-products x y) (if (= x y) x (* x (int-products (+ x 1) y)))) (int-products 3 5) (+ 2 3 (* 4 5) 7.1) (+ 2 3 (/ 4 (* 5 6))) (* 2 3 4 5)

Sweet-expression	(Ugly) S-expression
define factorial(n) if (n <= 1) 1 n * factorial(n - 1) substring("Hello" (1 + 1) string-length("Hello")) define move-n-turn(angle) tortoise-move(100) tortoise-turn(angle) if (0 <= 5 <= 10) display("True\n") display("Uh oh\n") define int-products(x y) if (x = y) x x * int-products((x + 1) y) int-products(3 5) (2 + 3 + (4 * 5) + 7.1) (2 + 3 + (4 / (5 * 6))) *(2 3 4 5)	(define (factorial n) (if (<= n 1) 1 (* n (factorial (- n 1))))) (substring "Hello" (+ 1 1) (string-length "Hello")) (define (move-n-turn angle) (tortoise-move 100) (tortoise-turn angle)) (if (<= 0 5 10) (display "True\n") (display "Uh oh\n")) (define (int-products x y) (if (= x y) x (* x (int-products (+ x 1) y)))) (int-products 3 5) (+ 2 3 (* 4 5) 7.1) (+ 2 3 (/ 4 (* 5 6))) (* 2 3 4 5)

So I’ve modified my demo program so that it supports infix operator chaining, such as (2 + 3 + 4). Since I no longer need to implement precedence, the addition of chaining means that I now have a working model of the whole idea, ready for experimentation. My demo isn’t ready for “serious use” in development yet; it has several known bugs and weaknesses. But it’s good enough for experimentation, to see if the basic idea is sensible - and I think it is. You can actually sit down and play with it, and see if it has merit. There are still some whitespace rules I’d like to fiddle with, to make both long files and interactive use as comfortable as possible, but these are at the edges of the definitions… not at its core.

I’m suggesting the use of && for “logical and”, and || for “logical or”. These are common symbols in other languages, and using the same symbols aids readability. Now, in Common Lisp and some Scheme implementations, || is “the symbol with 0-length name”. Oddly enough, this doesn’t seem to be a problem; Lisps can generally bind to the symbol with the 0-length name, and print it the same way, so it works perfectly well! In Scheme this is trivially done by running this:

define(&& and)
define(|| or)

Then you can do this:

(#t && #t)
if ((a > b) || ((a * 2) < (c + d + e))) ...

Instead of the hideous s-expressions:

(and #t #t)
(if (or (> a b) (< (* a 2) (+ c d e))) ...)

Here are some quotable quotes, by the way, showing that I’m not the only one who thinks there’s room for improvement:

I’ve used Lisp my whole programming life and I still don’t find prefix math expressions natural. - Paul Graham
I have more faith that you could convince the world to use esperanto than [to use] prefix notation. - Paul Prescod

Lisp-based languages are all over the place. There are vast number of implementations of Common Lisp and Scheme. GNU guile is a Scheme implementation embedding into many other programs, for example. GNU emacs is a widely-used text editor, built on its own dialect of Lisp. AutoCAD has its own variant under the covers, too. Programs like PVS are implemented in Lisp, and interacting with it currently requires using s-expressions. It’d be great if all of these supported an alternative, simpler syntax. With sweet-expressions, typical s-expressions are legal too. So I think this is a widely-useful idea.

So if you’re interested, take a look at http://www.dwheeler.com/readable.

path: /misc | Current Weblog | permanent link to this entry