Updates to Altos Lisp

I wrote a few days ago about a tiny lisp interpreter I wrote for AltOS

Really, it's almost "done" now, I just wanted to make a few improvements

Incremental Collection

I was on a walk on Wednesday when I figured out that I didn't need to do a full collection every time; a partial collection that only scanned the upper portion of memory would often find plenty of free space to keep working for a while.

To recap, the heap is in two pieces; the ROM piece and the RAM piece. The ROM piece is generated during the build process and never changes afterwards (hence the name), so the only piece which is collected is the RAM piece. Collection works like:

chunk_low = heap base
new_top = heap base

For all of the heap
    Find the first 64 live objects above chunk_low
    Compact them all to new_top
    Rewrite references in the whole heap for them
    Set new_top above the new locations
    Set chunk_low above the old locations

top = new_top

The trick is to realize that there's really no need to start at the bottom of the heap; you can start anywhere you like and compact stuff, possibly leaving holes below that location in the heap. As the heap tends to have long-lived objects slowly sift down to the beginning, it's useful to compact objects higher than that, skipping the compaction process for the more stable area in memory.

Each time the whole heap is scanned, the top location is recorded. After that, incremental collects happen starting at that location, and when that doesn't produce enough free space, a full collect is done.

The collector now runs a bunch faster on average now.

Binary Searches

I stuck some linear searches in a few places in the code, the first was in the collector when looking to see where an object had moved to. As there are 64 entries, the search is reduced from 32 to 6 compares on average. The second place was in the frame objects, which hold the list of atom/value bindings for each lexical scope (including the global scope). These aren't terribly large, but a binary search is still a fine plan. I wanted to write down here the basic pattern I'm using for binary searches these days, which avoids some of the boundary conditions I've managed to generate in the past:

int find (needle) {
    int l = 0;
    int r = count - 1;
    while (l <= r) {
        int m = (l + r) >> 1;
        if (haystack[m] < needle)
            l = m + 1;
        else
            r = m - 1;
    }
    return l;
}

With this version, the caller can then check to see if there's an exact match, and if not, then the returned value is the location in the array where the value should be inserted. If the needle is known to not be in the haystack, and if the haystack is large enough to accept the new value:

void insert(needle) {
    int l = find(needle);

    memmove(&haystack[l+1],
        &haystack[l],
        (num - l) * sizeof (haystack[0]));

    haystack[l] = needle;
}

Similarly, if the caller just wants to know if the value is in the array:

bool exists(needle) {
    int l = find(needle);

    return (l < count && haystack[l] == needle);
}

Call with Current Continuation

Because the execution stack is managed on the heap, it's completely trivial to provide the scheme-like call with current continuation, which constructs an object which can be 'called' to transfer control to a saved location:

> (+ "hello " (call/cc (lambda (return) (setq boo return) (return "foo "))) "world")
"hello foo world"
> (boo "bar ")
"hello bar world"
> (boo "yikes ")
"hello yikes world"

One thing I'd done previously is dump the entire state of the interpreter on any error, and that included a full stack trace. I adopted that code for printing of these continuation objects:

boo
[
    expr:   (call/cc (lambda (return) (set (quote boo) return) (return "foo ")))
    state:  val
    values: (call/cc
             [recurse...]
             )
    sexprs: ()
    frame:  {}
]
[
    expr:   (+ "hello " (call/cc (lambda (return) (set (quote boo) return) (return "foo "))) "world")
    state:  formal
    values: (+
             "hello "
             )
    sexprs: ((call/cc (lambda (return) (set (quote boo) return) (return "foo ")))
             "world"
             )
    frame:  {}
]

The top stack frame is about to return from the call/cc spot with a value; supply a value to 'boo' and that's where you start. The next frame is in the middle of computing formals for the + s-expression. It's found the + function, and the "hello " string and has yet to get the value from call/cc or the value of the "world" string. Once the call/cc "returns", that value will get moved to the values list and the sexpr list will move forward one spot to compute the "world" value.

Implementing this whole mechanism took only a few dozen lines of code as the existing stack contexts were already a continuation in effect. The hardest piece was figuring out that I needed to copy the entire stack each time the continuation was created or executed as it is effectively destroyed in the process of evaluation.

I haven't implemented dynamic-wind yet; when I did that for nickle, it was a bit of a pain threading execution through the unwind paths.

Re-using Frames

I decided to try and re-use frames (those objects which hold atom/value bindings for each lexical scope). It wasn't that hard; the only trick was to mark frames which have been referenced from elsewhere as not-for-reuse and then avoid sticking those in the re-use queue. This reduced allocations even further so that for simple looping or tail-calling code, the allocator may never end up being called.

How Big Is It?

I've managed to squeeze the interpreter and all of the rest of the AltOS system into 25kB of Cortex-M0 code. That leaves space for the 4kB boot loader and 3kB of flash to save/restore the 3kB heap across resets.

Adding builtins to control timers and GPIOs would make this a reasonable software load for an Arduino; offering a rather different programming model for those with a taste for adventure. Modern ARM-based Arduino boards have plenty of flash and ram for this. It might be interesting to get this running on the Arduino Zero; there's no real reason to replace the OS either; porting the lisp interpreter into the bare Arduino environment wouldn't take long.