(Syntactic) Sugar Never Tasted So Good

For the sake of experimentation, I’m writing this post in reStructuredText instead of Markdown. I’m not entirely convinced that I like it at present, but this may be just an artifact of having been exposed to Markdown a great deal more frequently. Besides, they’re both just syntactic sugar for HTML, right? All they do is make it a little easier to read and write the content of a text-based document without getting caught up in the form. Which brings me to the subject of today’s post…

One complaint I often hear levied against the Lisp family is that the syntax is ugly to look at. While I personally like the parenthetical notation, turning that discussion into a “Yes it is!” “No it’s not!” kind of thing is entirely too reminiscent of the family-dinner-table conversations of my youth - so maybe a better solution would be to make it pretty? After all, much is often made of Lisp’s ability to self-modify through defining reader macros. So I set out to build a proof-of-concept in Clojure.

As it turns out, Clojure doesn’t support defining new reader macros, and this is a design choice that is unlikely to change. There have been some fairly hackish solutions to this problem, but the long story short is that there are not going to be programmer-defined reader macros, unless the programmer in question is Rich Hickey. Well, that’s a shame! I’ve been so pleased with Clojure otherwise. So what else can we do?Well, we could use Common Lisp, of course, but I don’t want to give up on Clojure quite so easily. Perhaps I’ll start by writing the macro itself and then burn the “how to make it usable” bridge when I come to it.

So let’s say I have some Clojure code, like this.

This is perfectly readable if you’re already familiar with Lispy syntax, but looks kind of like voodoo if you’re not. Wouldn’t it be easier to transition from a more block-style language if we could do something like this?

If you said yes to that question, thank you for indulging me. It turns out we can! The latter is automatically generated from the former entirely through a bunch of messy regular expressions, like so:

Definitely not the prettiest code I’ve ever written, but I hope that’s clear to read. On the other hand, if it’s not, I know what to do…

A left paren causes a new line and an increase in the indentation level by one; a right paren causes a new line and a decrease in the indentation level by one. This makes the structure of def and defn calls clear at a glance, which is kind of nice! And we shouldn’t be losing any meaningful information by doing this transformation - so it ought to be possible to transform back to syntactically correct Clojure. Doing this in terms of regular expressions is kind of a pain in the rear, but you get the idea.

So, now that we’ve done it the ugly way with regular expressions, how about a quick demonstration of why macros are awesome?

Here’s a macro that takes some Clojure code and gives it the newline-indent structure we want, with a sample input and output.

(Yes, it’s a multiple-arity macro. So sue me.)

Clearly this is a lot easier in terms of S-expressions than regular expressions. For example, the whitespace in the input doesn’t matter; you can insert tabs, spaces, newlines, and what-have-you to your heart’s content: the macro doesn’t even have a concept of “whitespace”, because the reader ignores it when creating the S-expression. And we get an S-expression back out, as well. Admittedly not an evaluable one, because “n” is not a function, but hey, baby steps…

Speaking of which, our next baby step is to dispense with the S-expression and have the macro output a string. We’ll give it a slightly more descriptive name while we’re at it.

(Note: Tabs are displaying strangely in these embedded gists, but the correct spacing is visible in the raw file and on the github page.)

So that’s pretty much done! We can transform from standard S-expression syntax to whitespace-based syntax at will. In the altered syntax, a new line indicates that the first thing on that line is a function which is being applied to the rest of the symbols on that line. An increase in indentation means that we have nested functions. And to own the truth, it is a lot easier to read at a glance than parenthetical notation, despite my inclination to say “Bah! Humbug!” about such things.

The obvious next step is to write a transformer back from the whitespace form to S-expressions, or at least to a string which Clojure can parse on its own! But this post already grows longer than intended, so I’ll save that for part two.