Welcome to a blog series in which I share my experiences developing a programming language. While I hope that this will be informative enough to help others get started writing their own compilers, be warned: I do not know what I’m doing. There will be many mistakes, and it’s possible that fundamental bits of the language and/or compilers will change. Despite that, I hope you join me and possibly learn a thing or two.
tldr: I’m writing a functional programming language with prefix notation. The gory details are here.
I’ll be developing Ballet, a functional language focusing on prefix notation for function calls. In Ballet, (almost) everything is a value, and functions greedily consume arguments. Below is a very basic example Ballet program:
def double a int end int
On line 1, we’re defining a function named
double which takes in one argument of type
int and returns an
int. Line 2 creates a value which is the result of adding
a to itself, and line 3 ends the function. Since Ballet is value based, the last statement of a function is the value returned by that function. This means that
double will return the result of
.add a a.
Line 5 lets us see the results of calling our
double function. We pass the result of
.double. But wait: why doesn’t that line print
'.double 7'. The answer is prefix notation.
. signifies that we’re calling a function in the global namespace).
double also takes one value, so Ballet then goes to the next value -
7 is a “constrained” value, which means that it takes no arguments. It’s passed as the first argument to
double, turning the combination
.double 7 into a constrained value which holds the result of adding
7 to itself. That result is passed to
Line 6 is a bit more complicated since
add takes two arguments. Like before,
.double is the first value taken, and it takes
7 as its argument, turning it into the constrained value
.double 7 which is passed as the first argument to
add. The next value on the line is
8, a constrained value. This is passed as the second argument. If Ballet used parentheses to signify arguments, this line would be written as
.print (.add (.double 7) (8)).
Line 7 brings the ideas together into a much more complicated example. Ballet doesn’t care about whitespace or line breaks, so let’s break that line up a bit so it’s clearer what’s happening.
.print -- prints 25
In the block above, I’ve rewritten line 7 from the original example so that the argument to each function is on its own line, with arguments to the same function call having the same indentation. Technically,
.double 6 and
.double 3 should each be split over two lines since they’re a function call and argument, but since we’ve already seen how
double works, I kept them together for brevity.
Let’s break it down: line 1 holds a call to
.add, on line 2.
add takes two arguments: to find the first, we look at the next value which is another call to
add. The first argument to this inner call to
.double, which takes
6 as an argument to become the constrained value
.double 6. The next value,
7, is given as the second argument to
.add, which finalizes the constrained value of
.add .double 6 7. Ballet then moves to the next value to construct the second argument to the first
.add. It finds
.double, which takes the next value of
3 to be constrained to
One of the goals I had while designing Ballet was to avoid using the Shift key altogether. This means that parentheses, curly braces, and most syntactic items are off limits. Luckily, using prefix notation allows Ballet to have a simple syntax while still being just as powerful as any other language. In fact, Ballet only needs four non-alphanumeric symbols:
There are some other neat things which prefix notation gives Ballet. The biggest one is that all whitespace can be ignored. If we so wanted to, all of the above code could’ve been written on a single line. There are other languages which can also do this, but often times they require a semicolon or some other syntactical element to delimit statements. In Ballet, statements don’t need to be delimited since every constrained value is a statement.
Unfortunately, using prefix notation like this also has some downsides: namely that parsing Ballet code and generating an AST is much more complicated than other languages, and that Ballet code can become difficult to understand at times. Luckily, the
bind keyword lets us assign values to variables, which can greatly simplify complex expressions.
This post only covered the basics of Ballet syntax. For more information, please visit the Ballet compiler’s repository which contains a description of the rest of Ballet’s features.
In the next post, we’ll start actually writing Ballet’s compiler, starting with a Lexer.