I have been playing with this for a few days, and it is really incredible.
Note that the majority of the 1k interpreter is actually the word list! There are only a few hundred bytes of code. The planck.xxd file is relatively straightforward to understand if you have looked at a Forth implementation before. I think the real jewels are:
- The careful selection of the basic words that are understood by the binary interpreter
- The decision to forego even the trivial parsing of Forth in the bootstrap language, and implement effectively a bytecode interpreter but with Forth-like semantics and a Forth dictionary structure
- The buildup to something very like standard Forth in
bootstrap.fs
If you have not previously looked at a Forth, I highly recommend perusing Jonesforth first. In particular, read through jonesforth.S
and then at least skim jonesforth.f
for an understanding of bootstrapping Forth. The planckforth bootstrap is much more low-level than the Jonesforth bootstrap, however!
Then … dive into bootstrap.fs
, which is a true joy to behold. The interpreter provided by planck.xxd
reads exactly one byte at a time and executes it as a word from its dictionary, crashing immediately if the word is not found. The only I/O it is capable of is likewise one byte at at time, from standard input. It cannot handle even whitespace or comments!
The bootstrap code very elegantly takes this extremely primitive environment one step at a time to a language that rapidly starts to look like Forth; first it defines no-op words for space and newline so that whitespace can be used to make the code more readable, then it defines Forth-style \
line comments, and then it starts to build up defining words. The first word definition (for newline) looks like line noise:
h@l@h@!h@C+h!k1k0-h@$k:k0-h@k1k0-+$h@C+h!ih@!h@C+h!kefh@!h@C+h!l!
However, just a few definitions in, this much for Forth-y definition (note it’s still only single character words!) is equivalent to rp@ cell + rp!
, to drop a word from the return stack:
r C + R
(NB that I’m not sure why it doesn’t use } _
, which is R> DROP
.)
A bare few dozen lines after that, all of those one-word definitions are renamed to their Forth equivalents, so that those terse definitions can be abandoned, and definitions start looking like:
: 2rot >r >r 2swap r> r> 2swap ;
… and then we’re firmly in Forth territory!
After all of this slow buildup (including defining some simple assembly routines to compile things like system call invocations!), the culmination is a dictionary rewrite that drops the words for all of the early, non-Forth words that were only used to get from here to there, and the user is dropped into an environment where relatively normal Forth code Just Works. It takes 2670 lines of (heavily and usefully commented) code to get there, but considering that it started from something not far from the equivalent of assembly language, it is a joy to behold.