owlisp interpreter, compiler wannabe

I have recently been doing a bit of work on owlisp to implement an interpreter. This is meant as a first step towards building the compiler (LLVM frontend) that I am aiming at.

But before going into details of my recent work:

Using the interpreter

The interpreter has come to a state that it allows you to start a REPL in which the following elements can be evaluated:

  • lambda
  • let
  • if
  • symbols (variables)
  • quoted expressions
  • literals (strings & numbers)
  • function application
  • builtin functions: + * – / print car cdr

You can start the REPL by (I will update the instructions in README.md in the github project over the next days):

(owlisp/evaluator:owlisp-toplevel)

and then you should be able to evaluate for example the following:

owlisp> ((lambda (a b)
            (let ((c 3))
               (+ a b c)))
          1 2)
...
6

Note: You will see a lot of debugging information, which will of course be removed (or toggable) in the future.

Recent development

I have been doing a lot of reading (especially in the books mentioned in this post) on the topic of how to compile from a high-level language into something like machine language (like LLVM-IR).

This research however led me to a rework of the owlisp interpretation process and the environments that are involved in this step:

Separation of interpretation phase

According to a chapter in “Structure and Interpretation of Computer Programs”, I divided the interpretation into two phases:

  1. analysis phase
  2. evaluation phase

So, instead of (possibly) running through the entire evaluation process (including parsing) for one expression multiple times, the analysis phase does all steps that can be done at “compiletime”. Basically, the analysis phase replaces each expression with a procedure (closure) that will be called at runtime with the current environment, containing variable bindings.

This separation of analysis and evaluation phases not only makes the interpreter faster (since the parsing does not have to be done more than once when for example calling a function multiple times), but it is also a necessary step towards building a compiler, because the analysis phase is the equivalent to the compile phase.

Separation of environment

As a second step (and according to “Lisp in Small Pieces”), the environment (that up to this point contained a mapping from symbol-names to values) has been split into two components:

  1. compiletime environment
  2. runtime environment

The compiletime environment tracks only symbol names and is being built up and maintained during compiletime (hence the name, obviously 😉 ). Each and every symbol gets a fixed address in that environment.

Now, during runtime, we build up the “runtime” environment which keeps track of only the values that get bound to the symbols. The crucial thing here is that the address of a value in the runtime environment equals the address of the corresponding symbol in the compiletime environment.

What’s the advantage of that? Well, this construct enables us to convert each symbol to a fixed address during compiletime, so that later at runtime we only need to look up the value at that address in the runtime environment.

This means we don’t have to mess with symbol names during runtime anymore at all. Looking up values with a fixed address is obviously much faster than comparing given symbol names (=strings) to the symbol names stored in the environment, since the former can virtually be reduced to a constant time lookup in an array.

Example

I’ll try to illustrate an (simplified) example here:

Let’s first define a convention:

For simplicity, I will write down addresses of elements inside environments as <addr:N> .

<addr:3> for example describes the element with index 3 inside an environment.

Now, let’s say we have the following expression to be evaluated:

(let ((a 11) (b 22)) (+ b a))

and we also have a compiletime environment, that is initially empty:

C: #()

During compiletime, the symbols “a” and “b” are being recognized as such inside the init-form of the let, and are thus inserted into the compiletime environment. This happens in the order of their appearance, thus resulting in index 0 for “a” and index 1 for “b”:

C: #(a b)

You can think of this step as declaration of a and b.

In the same phase, while analyzing (compiling) the body of the let, the references to b and a are being “converted” to their corresponding addresses inside the compiletime environment C, so (+ b a) internally becomes something like

(+ <addr:1> <addr:0>)

So, after compiletime, we end up with something like this:

(let ((a 11) (b 22)) (+ <addr:1> <addr:0>))

C: #(a b)

After all that, the interpreter will evaluate the let expression to its resulting value. This is where the runtime phase begins. We start with the “compiled” expression above and an empty runtime environment:

R: #()

Now, the init-form of the let tells us that the value at address 0 shall be defined as 11 and the value at address 1 shall be defined as 22. Note that the symbol names are being ignored here – the addresses in the environment once again result only from the values’ order of appearance in the init-form of the let:

R: #(11 22)

Finally, the addresses in (+ <addr:1> <addr:0>) are being looked up in the runtime environment, so the resulting form looks like:

(+ 22 11)

which is just what one would expect.

I hope this example is somewhat clear. Please note that I left out some details, for example that environments are actually two dimensional, due to call frames (e.g. nested lets etc.).