Bootstrapping Forth from a 1K ELF

monsonite · January 20, 2021, 3:18pm

Bootstrapping any language is always an intriguing process.

Forth, for at least a decade remained a curiosty, curated by Charles H. Moore and offered to a few select radio-astronomers.

This github repository reveals one approach to the problem of bootstrapping a new language.

elb · January 21, 2021, 7:35pm

I have been playing with this for a few days, and it is really incredible.

Note that the majority of the 1k interpreter is actually the word list! There are only a few hundred bytes of code. The planck.xxd file is relatively straightforward to understand if you have looked at a Forth implementation before. I think the real jewels are:

The careful selection of the basic words that are understood by the binary interpreter
The decision to forego even the trivial parsing of Forth in the bootstrap language, and implement effectively a bytecode interpreter but with Forth-like semantics and a Forth dictionary structure
The buildup to something very like standard Forth in bootstrap.fs

If you have not previously looked at a Forth, I highly recommend perusing Jonesforth first. In particular, read through jonesforth.S and then at least skim jonesforth.f for an understanding of bootstrapping Forth. The planckforth bootstrap is much more low-level than the Jonesforth bootstrap, however!

Then … dive into bootstrap.fs, which is a true joy to behold. The interpreter provided by planck.xxd reads exactly one byte at a time and executes it as a word from its dictionary, crashing immediately if the word is not found. The only I/O it is capable of is likewise one byte at at time, from standard input. It cannot handle even whitespace or comments!

The bootstrap code very elegantly takes this extremely primitive environment one step at a time to a language that rapidly starts to look like Forth; first it defines no-op words for space and newline so that whitespace can be used to make the code more readable, then it defines Forth-style \ line comments, and then it starts to build up defining words. The first word definition (for newline) looks like line noise:

h@l@h@!h@C+h!k1k0-h@$k:k0-h@k1k0-+$h@C+h!ih@!h@C+h!kefh@!h@C+h!l!

However, just a few definitions in, this much for Forth-y definition (note it’s still only single character words!) is equivalent to rp@ cell + rp!, to drop a word from the return stack:

r C + R

(NB that I’m not sure why it doesn’t use } _, which is R> DROP.)

A bare few dozen lines after that, all of those one-word definitions are renamed to their Forth equivalents, so that those terse definitions can be abandoned, and definitions start looking like:

: 2rot  >r >r 2swap r> r> 2swap ;

… and then we’re firmly in Forth territory!

After all of this slow buildup (including defining some simple assembly routines to compile things like system call invocations!), the culmination is a dictionary rewrite that drops the words for all of the early, non-Forth words that were only used to get from here to there, and the user is dropped into an environment where relatively normal Forth code Just Works. It takes 2670 lines of (heavily and usefully commented) code to get there, but considering that it started from something not far from the equivalent of assembly language, it is a joy to behold.

elb · January 21, 2021, 7:50pm

I should have mentioned this particular jewel; the bootstrap interpreter does not know any numbers, so in order to get the binary number 1 to install a word of length 1 into the dictionary, the bootstrap reads an ASCII 1 and then an ASCII 0 and subtracts these two values, leaving the number 1 on the stack. This and many other tricks are used to reduce the size of the starting dictionary significantly. As I mentioned, the largest part of that 1k image is the dictionary itself (where each word occupies 16 bytes plus its plus its code elsewhere in the image, and most words are <= 16 bytes of code!).

B.B · January 21, 2021, 8:18pm

This reminds me of the entry for the 1992 International Obfusticated C Code Contest;
“First and Third, almost Forth”..

monsonite · January 21, 2021, 11:41pm

elb - this is fascinating, thanks for elaborating on my original post.

I have long been a proponent of single ascii character instructions that link to a block of executable code.

The key to success is to choose a set of primitive instructions (approximately 30) from which everything else can be synthesised.

eForth by Bill Muench and later C.H. Ting showed how Forth could be synthesised from a limited number of primitives.

If a primitive is reduced to a single ascii character, then this gives a huge compaction compared to coding everything in 6502 or Z80 assembly language.

Have just been studying the figForth listings for 6502 and 8080 (CP/M) Forths. About 1500 bytes of assembly language kernel and then everything else is written in Forth - a list of 16-bit runtime addresses.

Forth could be halved in size when primitives are given bytecode tokens.

oldben · January 22, 2021, 3:09am

I liked threaded code myself, seems to have more MAGIC.
Is Forth for the Forth cpu’s all written in Forth, or do they meta-comple to them? Now all we need is a $ 8 pie with 5v I/O and
1MB of ram for Forth deluxe.

elb · January 22, 2021, 3:19am

Planckforth is direct threaded.

monsonite · January 22, 2021, 12:03pm

I’ve looked at the C implementation of the interpreter planck.c

github.com

nineties/planckforth/blob/main/others/planck.c

/* planck - 
 * Copyright (C) 2021 nineties
 */
#ifndef INCLUDE_DEFS
#define INCLUDE_DEFS

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#define COPYRIGHT "Copyright (c) 2021 Koichi Nakamura <[email protected]>"
#define STRINGIFY(s) #s
#define RUNTIME_NAME(c) STRINGIFY(c)

#define VERSION RUNTIME_NAME(COMPILER) ":" COPYRIGHT

This file has been truncated. show original

The interpreter written in C will bootstrap in 0.324 seconds - on the given 2.6GHz machine.

The interpreter is about 190 lines of code, and it defines the 36 primitive instructions.

However having to read and execute “k” everytime you want to read the next character from the input buffer, seems ideal for some mechanisation. Implementing a “Next” function that automatically reads the next instruction from the buffer and increments the instruction pointer would remove the need for a separate k to appear several hundred times in the bootstrap file.

Similarly, having to subtract ascii characters from one another in order to put numbers on the stack, is quaint, but tedious.

Having a routine that identifies an ascii numerical string, converts it to an integer and puts it on the stack, would be less than 20 lines of C or fewer than 20 instructions in assembly language.

Finally, I think that factorisation would be a powerful technique to reduce code size.

The 9 character string h@!h@C+h! is used 27 times in the listing - it would be sensible to substitute that for an interpreted single character.

h@ is also very common - about 80 occurences, easily replaced with a single function “H” for “here-at”

elb · January 22, 2021, 2:29pm

I’m not sure this is easy, because of the implications for k. As it stands now, k requires no buffering or pushback capabilities; identifying an ASCII numeral string would require (at the least) a single character of pushback, which seems like it would complicate k substantially — which is already one of the largest basic words.

I played with some factorizations early in the code (e.g., defining the word 1 to replace k0k1-, which occurs quite a few times), and it turned out to have less effect than I had hoped, when all was said and done. I think there are definitely ways to simplify some of the code and do additional factorizing, but the current definitions are quite clever and well-ordered over all. As far as (e.g.) h@!h@C+h!, that is ,, defined immediately after space, newline, and \. So while moving it into the hardware set would be helpful, I think moving it up in the bootstrap would defeat the desire to get to commented, formatted code as quickly as possible.

oldben · January 22, 2021, 7:03pm

For bootstapping, I think is better to have generic instructions,
to be most portable,than packing code or tweeking. Still trying to figue how you built the bootstrap.

monsonite · January 22, 2021, 7:28pm

Ethan,

Pleased to hear that you gave factorisation some thought. There must be a vast number of ways of creating a bootstrap like this. Perhaps a million monkeys with terminals might stumble on a method.

Whilst the first Forth implementations for microprocessors appeared at the height of the 8-bit era 6502, 6800, Z80 etc these first posed the problem that you had to create a 16-bit virtual architecture, from essentially an 8-bit device - often lacking in registers and any real 16-bit instructions.

With the emergence of the PC in 1981 the 80x86 instruction set at least made things a little easier. Personally I like the MSP430 as a 16-bit target and I have written a character based interpreter in about 300 bytes of assembly.

Now that my interest has moved on to cpu simulations, I am looking for a simple 16-bit cpu model that provides the bare essentials for an efficient interpreted language. My first model was inspired by Steve Woznaik’s SWEET16, which may be simulated in about 70 lines of C code.

Latterly I have been revisiting the PDP-8 with the “what if” scenario that the instruction word was stretched to 16-bits. Still a work in progress, but I estimate it will be around 65 lines of C to simulate it.

Where I have dabbled with single character interpreters I like the use of the comma , as the push to stack operator as it allows comma separated numeric strings to be pushed onto the stack, whilst retaining a very natural written syntax. 123,456,789 etc

elb · January 22, 2021, 7:34pm

I’m not sure if there’s some confusion here from my above comments, but I want to clarify that I did not write Planckforth! It is by Koichi Nakamura. I have just been fascinated by it and spent quite a bit of time digging into it!

oldben · January 22, 2021, 8:54pm

I started with the same IDEA but ended up with a 20 bit CPU rather than 16 bits. I think the IBM 1130 was the best of the classic 16 bit machines, having index registers pemiting Forth or
Agol of some kind. Most others just had indirection.
What ever you design, for a JSR save the return address.
in the AC. Not being able to have code in ROM was the PDP 8’s biggest fault.
Ben.

saper · January 25, 2021, 9:10pm

There is also Sectorforth:

fits into 512 bytes of the Master Boot Record - but not handwritten