Let's build a language, Part 0

This is part of an ongoing series about my development of the Ballet Language. A description of what the language’s features will be can be found here.

Introduction

Welcome to a blog series in which I share my experiences developing a programming language. While I hope that this will be informative enough to help others get started writing their own compilers, be warned: I do not know what I’m doing. There will be many mistakes, and it’s possible that fundamental bits of the language and/or compilers will change. Despite that, I hope you join me and possibly learn a thing or two.

What is Ballet?

tldr: I’m writing a functional programming language with prefix notation. The gory details are here.

I’ll be developing Ballet, a functional language focusing on prefix notation for function calls. In Ballet, (almost) everything is a value, and functions greedily consume arguments. Below is a very basic example Ballet program:

Example Ballet Program
1
2
3
4
5
6
7
def double a int end int
.add a a
end

.print .double 7 -- prints 14
.print .add .double 7 8 -- prints 22
.print .add .add .double 6 7 .double 3 -- prints 25

What is prefix notation?

On line 1, we’re defining a function named double which takes in one argument of type int and returns an int. Line 2 creates a value which is the result of adding a to itself, and line 3 ends the function. Since Ballet is value based, the last statement of a function is the value returned by that function. This means that double will return the result of .add a a.

Line 5 lets us see the results of calling our double function. We pass the result of .double into .print, and we pass 7 into .double. But wait: why doesn’t that line print '.double 7'. The answer is prefix notation. print is a function which takes one value and outputs it to the terminal. Ballet looks to the value directly following .print and sees that it’s a function call to double (the . signifies that we’re calling a function in the global namespace). double also takes one value, so Ballet then goes to the next value - 7. 7 is a “constrained” value, which means that it takes no arguments. It’s passed as the first argument to double, turning the combination .double 7 into a constrained value which holds the result of adding 7 to itself. That result is passed to print.

Line 6 is a bit more complicated since add takes two arguments. Like before, .double is the first value taken, and it takes 7 as its argument, turning it into the constrained value .double 7 which is passed as the first argument to add. The next value on the line is 8, a constrained value. This is passed as the second argument. If Ballet used parentheses to signify arguments, this line would be written as .print (.add (.double 7) (8)).

Line 7 brings the ideas together into a much more complicated example. Ballet doesn’t care about whitespace or line breaks, so let’s break that line up a bit so it’s clearer what’s happening.

Restructured Ballet Example
1
2
3
4
5
6
.print                              -- prints 25
.add -- 19 + 6 = 25
.add -- 12 + 7 = 19
.double 6 -- 6 + 6 = 12
7 -- 7
.double 3 -- 3 + 3 = 6

In the block above, I’ve rewritten line 7 from the original example so that the argument to each function is on its own line, with arguments to the same function call having the same indentation. Technically, .double 6 and .double 3 should each be split over two lines since they’re a function call and argument, but since we’ve already seen how double works, I kept them together for brevity.

Let’s break it down: line 1 holds a call to print. The first (and only) argument to print is .add, on line 2. add takes two arguments: to find the first, we look at the next value which is another call to add. The first argument to this inner call to add is .double, which takes 6 as an argument to become the constrained value .double 6. The next value, 7, is given as the second argument to .add, which finalizes the constrained value of .add .double 6 7. Ballet then moves to the next value to construct the second argument to the first .add. It finds .double, which takes the next value of 3 to be constrained to .double 3.

Why use prefix notation?

One of the goals I had while designing Ballet was to avoid using the Shift key altogether. This means that parentheses, curly braces, and most syntactic items are off limits. Luckily, using prefix notation allows Ballet to have a simple syntax while still being just as powerful as any other language. In fact, Ballet only needs four non-alphanumeric symbols: /, ', ., and -!

There are some other neat things which prefix notation gives Ballet. The biggest one is that all whitespace can be ignored. If we so wanted to, all of the above code could’ve been written on a single line. There are other languages which can also do this, but often times they require a semicolon or some other syntactical element to delimit statements. In Ballet, statements don’t need to be delimited since every constrained value is a statement.

Unfortunately, using prefix notation like this also has some downsides: namely that parsing Ballet code and generating an AST is much more complicated than other languages, and that Ballet code can become difficult to understand at times. Luckily, the bind keyword lets us assign values to variables, which can greatly simplify complex expressions.

Closing

This post only covered the basics of Ballet syntax. For more information, please visit the Ballet compiler’s repository which contains a description of the rest of Ballet’s features.

In the next post, we’ll start actually writing Ballet’s compiler, starting with a Lexer.