Curly infix, Modern-expressions, and Sweet-expressions: A suite of readable formats for Lisp-like languages

by David A. Wheeler, 2006-06-17 (Revised 2009-06-02)

This page is obsolete; see http://readable.sourceforge.net instead.

Many people find Lisp s-expressions hard to read as a programming notation. This paper briefly describes a suite of three approaches I've developed: curly infix, modern-expressions, and sweet-expressions. These are not tied to any particular semantic, and can do everything regular s-expressions can do.

The Problem

Lisp-derived systems normally represent programs as s-expressions, where an operation and its parameters is surrounded by parentheses; the operation to be performed is identified first, and each parameter afterwards is separated by whitespace. So the traditional “2+3” is written as “(+ 2 3)”. This is regular, but most people find them hard to read. Even if you are used to them, this is a problem when trying to work with others.

Early Lisp was even harder to read, because it lacked abbreviations like 'x for (quote x). I believe that additional abbreviations and conventions can be created so that programs can be easily read without losing Lisp's capabilities. In particular, these are not tied to any particular semantic, so you can do metaprogramming and meta-meta-programming (and so on), all without problems.

Curly Infix, Modern-expressions, and Sweet-expressions

I've developed a 3-layer approach to making Lisp more readable, which is all based on adding additional abbreviations to an S-expression reader that can work with any S-expression (and are not tied to any particular semantic). These layers are:

  1. Curly infix: Any expression surrounded by {...} is an abbreviation for infix, e.g., {n <= 2} becomes (<= n 2). No precedence is included, by design (see below).
  2. Modern-expressions: Includes curly infix, and adds special meanings to the prefixed grouping symbols (), [], and {}. Thus, f(1 2) maps to (f 1 2).
  3. Sweet-expressions: Includes modern-expressions, and adds indentation as meaningful (like Python, Haskell, and many other languages).

All of these can be used in any Lisp-like language (Common Lisp, Scheme, Emacs Lisp, ACL2, BitC, etc.). Curly-infix is 100% compatible with existing Lisp code; modern-expressions and sweet-expressions are 100% compatible with existing well-formatted Lisp code. Yet they add additional abbreviations to the reader that make programming much more pleasant. Since they are automatically translated into s-expressions, yet maintain all their capabilities (quasiquoting, etc.), they lose no power.

Sweet-Expression Examples

Here are two quick examples - we'll use sweet-expressions version 0.2 to represent calculating factorials and Fibonacci numbers, in both cases using Scheme:

(Ugly) S-expression Sweet-expression 0.2
(define (fibfast n)
  (if (< n 2)
    n
    (fibup n 2 1 0)))
define fibfast(n)  ; Typical function notation
  if {n < 2}       ; Indentation, infix {...}
    n              ; Single expr = no new list
    fibup(n 2 1 0) ; Simple function calls
(define (fibup max count n-1 n-2)
  (if (= max count)
    (+ n-1 n-2)
    (fibup max (+ count 1) (+ n-1 n-2) n-1)))
define fibup(max count n-1 n-2)
  if {max = count}
    {n-1 + n-2}
    fibup max {count + 1} {n-1 + n-2} n-1
(define (factorial n)
  (if (<= n 1)
    1
    (* n (factorial (- n 1)))))
define factorial(n)
  if {n <= 1}
    1
    {n * factorial{n - 1}} ; f{...} => f({...})

Note that you can use traditional math notation for functions; fibfast(n) maps to (fibfast n). Infix processing is marked with {...}; {n <= 2} maps to (<= n 2). Indentation is significant, unless disabled by (...), [...], or {...}. This example uses variable names with embedded "-" characters; that's not a problem, because the infix operators must be surrounded by whitespace and are only used when {...} requests them.

It's actually quite common to have a function call pass one parameter, where the parameter is calculated using infix notation. Thus, there's a rule to simplify this common case (the prefix {} rule). So factorial{n - 1} maps to factorial({n - 1}) which maps to (factorial (- n 1)).

Credit where credit is due: The Fibonacci number code is loosely based on an example by Hanson Char.

Rules for Curly Infix, Modern-Expressions, and Sweet-Expressions

I've devised three levels of notation and given them each names: Curly Infix, Modern-Expressions, and Sweet-Expressions. Each builds on the previous one, so let's take them in order.

Curly Infix

"Curly infix" adds one simple rule:

{...} contains an "infix list". If the enclosed infix list has (1) an odd number of parameters, (2) at least 3 parameters, and (3) all even parameters are the same symbol, then it is mapped to "(even-parameter odd-parameters)". Otherwise, it is mapped to "(nfx list)" — you'll need to have a macro named "nfx" to use it

This rule means that {n = 0} maps to (= n 0), {3 + 4 + 5} maps to (+ 3 4 5), {3 + {4 * 5}} maps to (+ 3 (* 4 5)), and {3 + 4 * 5} maps to (nfx 3 + 4 * 5).

This rule may seem arbitrary, but it isn't. The first 3 conditions define a "simple infix" expression, which is exactly the set of all infix expressions that can represent a single list (an expression like (+ 3) doesn't really have an infix operator, since by definition an infix operator is between its operands). At first I considered reporting an error if a simple infix expression isn't sent, but prepending "nfx" is much more flexible.

Consistently using {...} so infix operators are always equal in a particular list has the advantage that all macros will see the usual list form - with the function in the first position. If you want operator precedence, define an nfx macro to implement the precedence rules you desire. Or, if you never want precedence, define nfx to be an error. The even parameters must be exactly the same symbol; pointer equality such as Scheme's eq? is a good way to test this. Every infix operator must be surrounded by whitespace for this rule to work as designed.

Notice that this does not include any precedence system, by design. Many people have devised infix processing systems for Lisps, and of course, they implement various mechanisms for precedence. If you have a specific semantic in mind, that's useful. But people often choose Lisp-based languages so that they can do meta-programming (and meta-meta-programming) - so soon there is no single precedence set, making precedence handling more harmful than helpful. It also causes trouble with code-sharing - not everyone agrees on a precedence level. By intentionally not building in a precedence system, we make things amazingly simple - we don't need to register functions, decide their order, or anything like it - making programming much simpler and easier. There's no need to memorize a precedence system, code transfers easily, and code is generally easy to read too (again, because you don't have to memorize a precedence system). In cases where you do want a precedence system, you can implement an "nfx" macro.

This use of {...} is highly compatible with various Lisps. I think this rule would be a great backwards-compatible addition to the standard reader of any Scheme and Common Lisp implementation. Scheme specifically reserves {...} for future use (R5RS section 2.3, R6RS section 4.21). Common Lisp does not define {} (see section 2.4 of the Common Lisp Hyperspec, based on ANSI Common Lisp X3.226), but notes its potential use by users. BitC spec version 0.10 (June 17, 2006) section 2.4.3 also reserves {...}.

It's important to note that inside the infix expression you can do anything you can do in normal Lisp. This is different from nearly all Lisp infix systems, which have their own incompatible language inside that can't handle arbitrary s-expressions. You can use arbitrary s-expressions with quasi-quoting, unquote-splicing, or whatever inside, and all without "registering" anything.

Surprisingly, this simple mechanism is actually enough to do what people actually want in an infix mechanism for Lisp. You can add things, like {x + 1}, or compare values, like {x <= 5}.

This is an unusually simple mechanism, but like much of Lisp, its power comes from its simplicity.

Modern-Expressions

Modern-expressions build on curly-infix's use of {...}. Modern-expressions also add the ability to use [...], as well as (...), to surround ordinary lists (Scheme R6RS does this too, and both Common Lisp and Scheme R5RS reserve [...] for future use).

What's more, if (...), {...}, or [...] are prefixed with a symbol or list (i.e., have no whitespace between them), they have a new meaning in modern-expressions:

  1. Prefixed (...). Syntax of the form e(...) — with no whitespace between symbol or list e and the open parenthesis — are mapped to (e ...). Any parameters in "..." are space-separated. This produces another expression, so this can be repeated (left-to-right). ‣ This adds support for traditional function notation. For example, "cos(x)" maps to "(cos x)", "max(3 4)" maps to "(max 3 4)", and "f(x)(a b)" maps to "((f x) a b)". Note that this is especially convenient for certain styles of functional programming, including lambda expressions; in Scheme, lambda((x) {x + x})(4) would compute as 8.
  2. Prefixed {...}. A prefixed expression f{...}, where f is a symbol or list, is an abbreviation for f({...}). ‣ This rule simplifies combining function calls and infix expressions when there is only one parameter to the function call. This is a common case; for example, "not" (which is normally given only one parameter) often encloses infix "and" and "or". Thus, f{n - 1} maps to (f  (-  n  1)). When there is more than one function parameter, use the normal term-prefixing format instead, e.g., f({x - 1} {y - 1}) maps to (f  (-  x  1)  (-  y  1)).
  3. Prefixed [...]. Prefixed square brackets e[...], where e is a symbol or list, maps to (bracketaccess e ...). ‣ Thus, "t[x]" maps to "(bracketaccess t x)". This is intended to simplify use of indexed arrays, associative arrays, and similar constructs. You could even define bracketaccess as a macro that simply returns its arguments; in this case f[5] would eventually map to (f 5).

These combine well with curly-infix forms of {...}. For example, {-(x) * y} maps to (* (- x) y).

A common extension must be supported: (. x) must mean x. This provides a simple way to escape certain constructs, such as the "." or "group" symbols that have extra meaning in sweet-expressions. It turns out that in a typical implementation of a list reader, it takes extra effort to prevent this extension, so this is an easy extension to include.

Modern-expressions are very compatible with most existing text editors for Lisp. Editors not "understand" the code, but many work to match (...), {...}, and [...], and that is enough to be useful. After all, Scheme R6RS already requires support for [...] anyway, and Common Lisp readers are designed to allow {...} to be overridden, so many text editors are designed to support this. Modern-expressions are easy to use at the command line, too - for example, you don't need to enter a blank line to execute something.

Sweet-expressions

Sweet-expressions start with modern-expressions and add indentation as meaningful:

Indentation is meaningful; the "I-expressions" of Scheme SRFI-49 are supported, with the 2008 I-expression revisions . An indented line is a parameter of its parent, later terms on a line are parameters of the first term, and lists of lists are prefixed with the term "group". A line with exactly one datum, and no child lines, is simply that item; otherwise that line and its child lines are themselves a new list. Indentation is disabled inside the grouping pairs (), [], and {}, whether they are prefixed or not. Lines with only leading whitespace and a ;-comment are completely ignored - even their indentation is irrelevant. Empty lines, possibly with tabs and spaces, are ignored during reading of the initial line of an expression; otherwise they end an expression.

A blank line always terminates a datum, so once you've entered a complete expression, "Enter Enter" will always end it. The "blank lines at the beginning are ignored" rule eliminates a usability problem with the original I-expression spec, in which two sequential blank lines surprisingly return (). (The sample implementation did end expressions on a blank line - the problem was that the spec didn't capture this.) A function call with 0 parameters must be surrounded or immediately followed by a pair of parentheses: (pi) or pi().

Generally it's best to start each new expression on the left edge; if you choose not do to that, include a blank line between each new expression.

Comments on the Rules

Note that usual Lisp quoting rules still work, so 'a still maps to (quote a). But they work with the new capabilities, so 'f(x) maps to (quote (f x)). Same with quasiquoting and comma-lifting. A ";" still begins a comment that continues to the end of a line, and "#" still begins special processing.

Implementations may call underlying implementations when they encounter "#"; in those cases, an expression begun by "#" will not continue to suport sweet-expressions. For example, in Scheme, use vector(...) instead of #(...). Many Scheme implementations have nonstandard extensions for "#", so a portable sweet-reader can't easily reimplement the functionality of a local "#". Nor can the sweet-reader easily call on the underlying implementation of "#" on some implementations, e.g., Scheme only supports a one-character peek with no unget character.

If an implementations called a "standard" s-expression reader when it encountered an open parenthesis, it would be extremely backward-compatible with essentially all existing Lisp files. However, this mode is hard to use; it would mean that you must use [...] for lists, and failure to do so would produce mysterious errors. After some experimentation, I found that it was a bad idea and dropped it.

The (. x) rule is a common extension in Scheme implementations; it's required here so that I-expression's "group" term can be easily escaped. Note that any "(" preceded by whitespace, "(", "{", or "[" is unprefixed.

Note that you have to disable indentation to use infix operators as infix operators. This doesn't seem to be a problem in practice.

With sweet-expressions, you can use the traditional Lisp read-eval-print loop as a calculator, as long as you remember to surround infix expressions with {...} and surround infix operators with whitespace. For example, "{3 + 4}" will be mapped to (+ 3 4), which when executed will produce "7". Use normal function notation for unary functions, e.g., "{-(x) / 2}" maps to "(/ (- x) 2)". Nest {...} when you need to, e.g., "{3 + {4 * 5}}" will map to "(+ 3 (* 4 5))". If you mix infix operators at the same level, you must have an "nfx" macro defined to handle precedence, and you must be careful about other macros you use.

Notice that since all the transforms happen in the reader, sweet-expressions are highly compatible with macros. Sweet-expressions simply define new abbreviations, just as 'x became (over time) a standard abbreviation for (quote x). As long as simple infix expressions are used (ones that don't create nfx), after reading the expressions all expressions are normal s-expressions, with the operator at the initial position. So macros defined by Common Lisp's macros, etc., will work as expected. Common Lisp has some hideously confusing terminology, though. Common Lisp has macros, but it also has a completely different capability: "macro characters", which introduce "reader macros" - i.e., hooks into the reader used during read time. The Common Lisp Hyperspec clearly states in its glossary on macro characters, "macro characters have nothing to do with macros", but I think they should have chosen a name that had nothing to do with macros as well. Obviously sweet-expressions can affect macro characters, since they implement a different reading syntax. This doesn't affect most real Common Lisp programs, which often avoid macro characters anyway. Common Lisp macro functions (e.g., defmacro and macrolet) work just fine with sweet-expressions.

I know of a possible future extension to sweet-expressions: Splicing with "\". I posted a splicing proposal on the readable mailing list. When doing indentation processing, if the first character of a form is "\" followed by whitespace:

  1. If it's the last character of the line (other than 0 or more spaces/tabs), then the newline is considered a space, and the next line's indentation is irrelevant. This continues the line. (Note that comments cannot follow, because that would be confusing.)
  2. If it's between items on a line, it's interpreted as a line break to the same indentation level.
  3. Otherwise, if it's at the beginning of a line (after 0+ spaces/tabs), it's ignored - but the first non-whitespace character's indentation level is used.
This is mainly to handle named parameters more gracefully, e.g.:
  myfunction \
    :option1 \ f(a)
    :option2 \ g(b)
could map to (myfunction :option1 (f a) :option2 (g b)). Note that f(a) or g(b) could be the beginning of a complex program using indentation, since \ does not turn off indentation.

Programming with Sweet-expressions

General Rules

Mentally, this is pretty straightforward - on each line, write an expression; everything after the first term on the line, or all child lines, are parameters of the first term. You can use grouping operators (), [], and {} to put subexpressions on the same line, if you want. Use -(...) to negate something.

Whenever you have an infix expression, just surround it with {...}. You can use the form f(...) to call a function; if it has zero parameters, express it as f(), and if it has more than one parameter, separate the parameters with spaces. The f(...) form is especially handy for creating short expressions as a parameter on a line; for long expressions, use indentation instead.

The word "group" starts lists of lists in sweet-expressions (and I-expressions). This makes it easy to create lists of lists, without having to create special syntax for each variation.

This is all implemented by modifying the "read" function, so that it recognizes all these formats and generates s-expressions. Since macros operate on s-expressions, macros work just fine. You can have infix operators in macro definitions, and you can have infix operators in the expressions processed by macros.

Interactively, you can just type 'load("filename")' or {3 + 2}, then Enter Enter.

Certain functions require groups, and you learn what they are (and their patterns) they're pretty easy to manage.

Examples of specific constructs

Here are a few examples, using sweet-expressions.

The "cond" form is widely-used, and works beautifully. Here's an example:

define f(x)
  ; display negative, zero, or positive, and return -1, 0, or 1 respectively.
  {x < 0} display("negative") -1
  {x = 0} display("zero")      0
  #t      display("positive") +1
If the condition gets long, or you have many operations, just make the operations child lines of the condition.

The "let" forms are a case where you need "group". E.G.:

let
  group
    x 2
    y 3
  {x * y}

I actually don't like "let" all that much anyway, even when using traditional Lisp notation. You might find it more efficient to define a single-variable let. Here's a straight s-expression form of this, using the define-macro form supported by many Scheme implementations including guile (it's a valid sweet-expression too, of course):

(define-macro let1
  ; Simple single-variable "let"; lets "variable" to "value", then computes.
  (lambda (variable value . computations)
    `(let ((,variable ,value)) ,@computations)))

By the way, here's the same macro, shown as a sweet-expression:

define-macro let1
  ; Simple single-variable "let"; lets "variable" to "value", then computes.
  lambda (variable value . computations)
    `let ((,variable ,value)) ,@computations)

Now we can do the same thing this way:

let1 x 2
  let1 y 3
    {x * y}

Here's a larger example, reformatted from the example in the Scheme Fixnum book:

define solve-kalotan-puzzle
  lambda []
    let
      group
        parent1         amb('m 'f)
        parent2         amb('m 'f)
        kibi            amb('m 'f)
        kibi-self-desc  amb('m 'f)
        kibi-lied?      amb(#t #f)
      assert
       distinct?(list(parent1 parent2))
      assert
       if eqv?(kibi 'm)
           not(kibi-lied?)
      assert
       if kibi-lied?
          xor
            {eqv?(kibi-self-desc 'm) and eqv?(kibi 'f)}
            {eqv?(kibi-self-desc 'f) and eqv?(kibi 'm)}
      assert
       if not(kibi-lied?)
          xor
            {eqv?(kibi-self-desc 'm) and eqv?(kibi 'm)}
            {eqv?(kibi-self-desc 'f) and eqv?(kibi 'f)}
      assert
       if eqv?(parent1 'm)
          and
            eqv?(kibi-self-desc 'm)
            xor
             {eqv?(kibi 'f) and eqv?(kibi-lied? #f)}
             {eqv?(kibi 'm) and eqv?(kibi-lied? #t)}
      assert
       if eqv?(parent1 'f)
          {eqv?(kibi 'f) and eqv?(kibi-lied? #t)}
      list(parent1 parent2 kibi)

solve-kalotan-puzzle()

Is the World Ready for this?

I'm well aware that there are some who don't like any change in Lisp notation. Some of these people seem to believe that the current Lisp notation was handed down from on high, never to be changed. Well, you don't have to use improvements like this, or even agree that they are improvements. But most software developers have abandoned Lisp precisely because of Lisp's hideous, inadequate notation (and I say that as someone who has used Lisp for decades). Lisp notation was not handed down from on high, and it has changed over time. The "LISP 1.5 Programmer's Manual" (by John McCarthy, Paul W. Abrahams, Daniel J. Edwards, Timothy P. Hart and Michael I. Levin; The M.I.T. Press, 1962, second edition) describes the parent of all modern Lisp-based systems. (Note that even LISP's creator didn't think much of using S-expressions as a programming notation.) LISP 1.5 did not have a ' operator - you had to say (QUOTE X). It didn't have abbreviations for quasiquoting (`) or comma-lifting (,) either. Today, people would not accept a Lisp that didn't at least have the common abbreviation for QUOTE. Indeed, Tony Hasemar's book "A Beginner's Guide to Lisp" (1984) says in the second page of the Foreward, "do NOT buy a Lisp which does not allow the single-quote sign in place of the word QUOTE, unless you have absolutely no alternative". Lisp notation has been stagnant for a while; it's time to add modern conveniences as abbreviations.

Some objections don't seem to realize that this proposal is different. It's true that there have been many abandoned efforts of the past to improve on S-expressions, but I think all those efforts failed to realize that any replacement for S-expressions must be completely general, just as S-expressions are, and not tied to a particular semantic. Practically all past efforts, such as M-expressions and similar work, failed precisely because they weren't general enough. It's true that tooling support is necessary for any notation like this (e.g., in program editors), but that's why a standard format needs to be defined so tools can implement it (and not 1000 application-unique reader macros). There's no reason tools can't support sweet-expressions as well as they support s-expressions today.

I think most software developers will not agreeably use a Lisp-based language unless that language has better built-in support for an easy-to-read programming notation. Programs must be read by others, and if the programming notation is odious to read, then the language has a key flaw. Most developers think Lisp is odious to read, even after they've used it for a while. If the Lisps won't provide an easy-to-read notation, those developers will just use another language that's more user-friendly (even when it's less appropriate for their problem) - and that is precisely what they are doing. Here, we try to learn from the past, keep all of S-expression's benefits, but provide a better notation that others can read.

Closing Remarks

Sweet-expressions can take a few minutes to learn how to use, just like anything else new. But I think they won't take long for people who already know how to use s-expressions, and they are far more readable in my opinion. Something impenetable can be written using sweet-expressions, of course, but at least the basics of the notation don't get in the way. There is a risk that the notation could deceive the reader into confusion; I think after using the notation for a little bit this is unlikely, but that's sometime that an experiment should test. In any case, in an era where developers must read a lot of code, thinking about ways to improve readability is important. I hope that this is, or is the beginnings of, a way to improve readability for s-expressions.


For more information, see my website page at http://www.dwheeler.com/readable. I've also set up a SourceForge project where options like sweet-expressions can be discussed, and code can be shared. If you're interested, please join!