Lessons learned from implementing minimal Scheme four six times

blog
Published

November 29, 2022

We were in the middle of a global pandemic. Tormented by fear, locked in our homes, everyone was coping in their own way. Some people started baking bread, writing poetry, learning to play guitar, or doing home gardening, and I… went re-implementing Scheme lisps. Creating minimal Scheme interpreters became my favorite programming kata. By “minimal” I mean bare-bones language but feature-rich enough that can run all the examples from the classic The Little Schemer book by Daniel P. Friedman and Matthias Felleisen.

Do It, Do It Again, and Again, and Again …
  — The Little Schemer by Friedmann and Felleisen

I implemented it already in Go, OCaml, Erlang, and Racket (itself a flavor of Scheme). My main purpose was to learn new programming languages and learn more about programming language theory (in practice). To verify my code, as an integration test, I used the examples from The Little Schemer and accompanying unit tests from the repository by bmitc. I was also benchmarking my code against MIT Scheme.

Lisp cycles XKCD #297: “Those are your father’s parentheses. Elegant weapons for a more… civilized age.”

(source https://xkcd.com/297/)

Why implementing a lisp is such a great kata?

Implementing lisp interpreters is my favorite programming kata. It is a non-trivial problem, but small enough to finish it in a limited time. It is also a chance to touch most of the functionalities of the programming language you use for the problem:

  • You need to build a simple parser. This is an interesting programming exercise by itself. It also lets you familiarize yourself with strings.

  • Lisp uses cons linked lists for all purposes. Some languages support linked lists natively (OCaml, Racket), some don’t, and in some, it is particularly hard (Rust). This is how you’ll score a point in a leetcode-style whiteboard interview when they’ll ask to implement a linked list.

  • It is a chance to learn about types in the programming language you use. Some languages have advanced type systems (e.g. OCaml, Rust), but some don’t support custom types (e.g. erlang, Lua), so you need to simulate them.

  • To store the variables, you need an environment. There is a global environment, but also local ones (see closures and scopes). For this, you’d likely need tree-like data structures, references, and hash maps.

  • For lambdas, you’ll need to learn more about anonymous functions.

  • Lisps extensively use recursion, so the implementation needs to be tail-call optimized, otherwise you’ll quickly see stack overflow errors. The best way to unit-test it is to use a tail-call optimized implementation of Fibonacci sequence generator and evaluate it for a large value of the argument (>100).

    (define impl (lambda (it second first)
       (if (= it 0) first
          (impl (- it 1) (+ first second) second))))
    
    (define fibo (lambda (n) (impl n 1 0)))
  • To build REPL, you’ll need to learn how to interact with standard input and output.

  • When using REPL, it doesn’t panic on each error, but rather prints the error message. If you want this behavior, you need to explore how errors are handled in the programming language you use.

  • Finally, you would learn to build a command-line interface.

Prologue

I don’t have a computer science background, and one day I decided to learn more about programming language theory. I started reading the dragon book, though I never finished it. I needed something more practical and hands-on, and this is how I discovered the Build Your Own Lisp book. It was nice, but it had code examples in C and focused too much on C for me. Hopefully, I also found the great Writing An Interpreter In Go book that used Go language to illustrate building an interpreter for a (non-lisp) programming language. Another great inspiration and source of help were the make a lisp repository with an end-to-end tutorial and learning materials for freaks like me. Around the same time, I was reading the classic programming books: The Little Schemer and Structure and Interpretation of Computer Programs, which used Scheme, so I was curious to write my Scheme, to get a better “behind the scenes” understanding. Among other resources, The Scheme Programming Language reference book by R. Kent Dybvig was helpful as well.

Go

I was curious about Go. The Writing An Interpreter In Go book motivated me, even more, to try implementing a lisp in this language. Go was very easy to learn, pleasant to work with, and has great documentation. To familiarize myself better with the language, I’ve read the Learning Go book, which I can recommend. Implementing a Scheme interpreter in Go was not straightforward, because the two languages are very different. Listing all the differences would be pointless, but the biggest one was that Go is statically typed, while Scheme is dynamically typed. To implement dynamic typing in Go, one needs to use interface{} type and cast it to desired types each time it’s needed. It resulted in a lot of boilerplate code. Scheme’s lists are just linked lists, that differ significantly from Go’s native arrays and slices, so I implemented it as a custom linked list myself. I described the design in greater detail in the readme of my repository. It was a great learning experience. I not only learned a lot about Scheme, but also the consequences of static vs dynamic typing, passing by values vs references, linked lists, and many other things.

OCaml

I knew the basics of OCaml before deciding to write my second implementation of Scheme in it. The two books that were a great introduction for me: OCaml from the Very Beginning and Real World OCaml (freely available online). While working on it, I found this blog where the author also implemented a lisp in OCaml. Same as Scheme, OCaml uses lists as a basic data structure and recursion as the default working mode, which made implementing it fairly straightforward. Moreover, OCaml has great pattern-matching utilities that made code much simpler and more compact than the Go implementation (~500 lines vs ~2000 lines). OCaml’s strong, but much more flexible than Go’s (which was bothersome), typing was also of great help to prevent type inconsistencies and warn me about potential problems early on. From the downsides, I didn’t find OCaml documentation that fabulous and I struggled a bit to understand how should I structure my project, handle dependencies, properly run unit tests, etc. The biggest surprise was that my implementation in OCaml was approximately five times faster than the one in Go (aka “the fast language”)!

Erlang

OK, that was a crazy one. I heard about Erlang and how it is an outlier in programming languages and wanted to learn it for some time. Implementing Scheme in it seemed to be right in the sweet spot: not trivial, complex enough, but doable in a finite amount of time. It also touched on many features of the language. I wasn’t aiming for performance (Erlang is slow), or doing it most efficiently, but rather playing around with Erlang’s features. Since Erlang was a bit scary to start with, I first read about Elixir (a language like Erlang, but with modern, Ruby-like syntax) from the Programming Elixir book and did the Exercism learning track (a good one). However, I wanted to go all the way down the rabbit hole (also “schemero” seemed to be a cool name, tbh). I started by reading Erlang’s author book Programming Erlang, and found Learn You Some Erlang for Great Good! (available freely online) very helpful. Like Scheme and OCaml, Erlang mostly works with lists and recursion. Same as OCaml, it has great pattern-matching utilities. But it has its quirks: its dynamically typed and does not support custom types (you can imitate them by using structs, e.g. {symbol, "name"} for a symbol type), moreover every object in Erlang is immutable (what has many interesting consequences), it has a strange (but likable!) syntax and stylistic conventions. The strangest of all is how Erlang treats everything as message passing. Since I wanted to learn more about it, my parser is a server that communicates with another server (that reads from a file or stdin) and returns parsed objects when available (e.g. user types something in REPL). Also, I treated environments as servers, so that they can hold all the Scheme objects and allow for mutating them (in Erlang you do mutability by playing hot potato and passing the state between the functions). The final implementation lacked garbage collector, and was not efficient, but worked and passed all the tests. And, oh boy, what a ride it was.

Racket

You may ask: why would anyone implement Scheme in Scheme?! In the end, the only thing you need to do is to run something like (eval (read ...)). Yes, but that would be too easy. I wanted to have a parser that reads textual input and parses it to Scheme-like objects, I wanted to imitate the types, environments, closures, etc. I found Racket’s documentation very helpful (though not perfect). Having already read a lot about Scheme and implemented it thrice, doing it in Racket was fairly easy. It was a chance to appreciate Scheme’s more advanced features like macros and classes (yes, it has classes, but check Structure and Interpretation of Computer Programs to learn how they are just syntactic sugar) to imitate the stateful environments. The biggest pain was when I decided to build REPL and needed to find out how should I properly read the input from stdin and stream it to the parser. For building the command line interface, beyond the official docs I found only one brief blog post on how to do it, so I just went by trial and error to figure out how to do it. The hardest part was the “simple” things.

Tasks XKCD #1425: - “When a user takes a photo the app should check whether they’re in a national park…” - “Sure, easy GIS lookup. Gimme a few hours.” - “… and check whether the photo is of a bird.” - “I’ll need a research team and five years.” Comment: In CS, it can be hard to explain the difference between the easy and the virtually impossible.

(source https://xkcd.com/1425/)

So what are the pros and cons of those languages?

  • Go is elegant in its simplicity and has great documentation and developer tools. It has a simple but strict static type system, so implementing dynamic types was quite tedious. The big upside is that Go has a large and active community, so it’s easy to find online an answer to any “how to…” question, or get it answered on StackOverflow.com (not necessarily the case for the other languages).
  • OCaml was a real pleasure to use, though it would be even better if it had better documentation.
  • Erlang… is interesting. There are many great ideas behind it. It also shows how the simplicity of a language does not make it less expressive: all the values are immutable, it has no namespaces, no custom types, etc, while being a language designed for building complex, concurrent systems. I liked how Erland uses uppercase-only for variable names, making them visually distinct (Go does a similar thing for private vs public functions).
  • Scheme, oh good old Scheme. What I was missing the most as compared to other functional languages was pattern-matching. Being able to write [Head | Tail] = List (Erlang) instead of (let ([head (car lst)] [tail (cdr lst)]) ...) is so much cleaner. A lisp with pattern-matching wouldn’t differ that much from any modern functional language. The documentation could be improved.

What’s next?

It was a great learning experience. Now, I’m struggling between resting from lisp and the compulsion to repeat it in another language. It could be a chance to learn better how Haskell handles side effects. On another hand, I already did it in functional languages. Cool kids those days learn Rust1, so who knows? Lua also sounds intriguing.2 Finally, I didn’t touch JavaScript for years and there is this TypeScript thing, right? Or maybe Crystal? Some masochistic part of me thinks of doing it in AWK or a Makefile, but not sure if this would be the pleasant kind of pain.