Forth in the boot sector

We’ve seen quite a few posts about Forth, being a language with a rich history in small and homebrew systems, in no small part due to the efforts of the [Forth Interest Group] ( to provide high quality public domain Forth implementations for a huge array of systems. Still today, programmers are creating Forths for homebrew systems or as exercises in programming skills.

Cesar Blum has created one of the latter in [Sector Forth] (, just uploaded days ago — a Forth interpreter in 512 bytes that fits in an x86 boot sector! It provides only the barest notion of primitives (eight of them), and builds the rest up (including most arithmetic!) as Forth words. It includes some targets for trying it in the QEMU emulator.

The hello world in the examples is particularly enjoyable.


Strictly speaking the code section of master boot record is 446 bytes; above that is the partition table. Nevertheless it is an interesting idea. Lately I’ve used Compact flash as EPROM for my retro designs which have a tiny bootstrap in CPLD to load & run the master boot record which, in turn, load the rest of application program. This small program can do it all. I wonder whether there is a Z80 version?

1 Like

This is a fascinating example of minimum instruction set computing.

Squeezing it into the boot sector has obviously been quite a challenge - and it is clear that quite a few words need to be executed in order to synthesize the remainder of the useful primitives.

It reminds me a little of the PDP-8 with its minimal instruction set - where quite a lot of work was needed to create useful instruction macros XOR for example.

I would relax the requirements and suggest that a 1K codesize and 16 primitives would allow a more user friendly machine.


Forth uses 1K blocks so that makes sence.
How ever fitting it into xyz bytes only makes sence
for older cpu’s. A 32 bit RISC needs 4x as much space
as say a 8086/8088.
Still a good feat of work.

It is a floppy boot sector; floppies do not have a partition table on x86 machines.

Even more Impressive, a FLOPPY DRIVE in this era.:slight_smile: Ben.

About the primitives, my 1990 graduation project was a 12 bit Forth chip (with a 4 bit memory bus - we had very few pins available). The instruction set was essentially this list of primitives except I initially didn’t have +. I did have “rotate to the right”, which combined with NAND allowed me to implement addition in software.

My benchmark was adding two vectors and storing the result in a third vector. As you might imagine, it used a lot of addition both for addressing as well as for obtaining the sums. A quick test showed that the software addition was taking 700 clock cycles. So I included a new one clock add instruction (increasing the instruction set from 12 to 13) and expected a huge improvement in the benchmark. It did get better, but less than three times as fast. This made it clear to me the importance of the quantitative approach.


Have you got a link to your graduation project report please?

Interesting - did you manage to track down the likely causes of this anomaly in performance expectation?

Chuck Moore’s work in MISC Forth cpus has suggested that the optimum number of primitives lies somewhere between 16 and 32.

There is an analysis of frequency of primitive usage in Phil Koopmans book: Appendix C page 214


Sorry, I have just looked for any material from that project and can’t find any. There might be a copy of the report itself (in Portuguese, unfortunately) is some old Mac floppy but it would be pretty hard to find once I start going to my office again.

This was before Verilog or VHDL were widely available, so the RTL (register transfer level) simulation was done in a Pascal-like language called Lidex created by one of my colleagues.

For the actual chip design I ran into a problem: each user had a disk quota of only 1MB. But the chip synthesis tools would generate up to 6MB of temporary files. There was a temporary area that anybody could use to transfer files between the Windows PCs and the Sun workstations. People had left files there for months and nobody cared. So I did all my work there.

On the day the report was due I went to print the chip layout and found that the temporary area had been cleared. I had the simulation results already printed. So I redesigned the chip from scratch in just 3 hours. It was greatly improved by being far more testable but wasn’t quite working when I ran out of time. So I printed the schematics and layout: if the teacher could tell that the waveforms didn’t quite match those then I would gladly accept him failing me! On the next day all the files were gone again.

It was just the simplistic notion that “this benchmark uses + a lot and I made it 700 times faster, so it should be hundreds of times faster overall”.

Actually doing the math shows that if everything except the + is taking up 33% of the time then the overall benchmark can only become 3 times faster.

They had promised us that if the graduation projects turned out good enough that they would actually be sent to be fabbed though with serious limitations on the area and number of pins. The other groups did simple test circuits with just a few logic gates. An adder takes up enough space that I tried to skip the + at first. Certainly having more primitives would push it over the area limit. In the end it didn’t get made anyway, so I could have just done a far nicer design.

Yikes! That’s quite the experience. Thanks for telling the story though!

I should add that I was already on shaky ground with the teacher. His area was digital signal processing, which at that time meant racks full of huge TTLs boards. So processors and software were not things he was familiar with.

He complained that our report used concepts like stack machines and register architectures without properly introducing them. It was already some 70 pages long and I refused to turn it into a full blown introduction to computer architecture. So without some pretty pictures of chip layout (actually a big black blob was the best that dot matrix printer could do) he would not have accepted the text at all.

His other complaint was that the circuit that was simulated used a latch built from a bunch of NAND gates while the actual chip used a far more compact design with pass transistors. All processors used the latter design but our simulator could only deal with regular logic gates. He wanted a formal proof that the two circuits are equivalent - just showing that the waveforms were the same wasn’t good enough.

1 Like

What a wonderful story of triumph over adversity!

I did my degree was about 5 years earlier and programmable logic was just appearing in the form of PALs and GALs.

I took a course on those in the final year, given by a new, young lecturer who had a talent for bringing new ideas to what was a somewhat stuffy department full of middle aged staff. Most of them were from the pre-transistor era!

Word processors and printers were almost non-existent or unaffordable to students, so dissertations were written out by hand and given to the departmental secretary to type out - for a small sum of money. Drawings and diagrams were painfully done with a drawing pen and ink.

You were lucky that you had access to a Sun Workstation!

1 Like

Thanks! But I consider it a mere inconvenience compared to some of the other stories I have promised to tell in another thread and still plan to do so.

I entered the university in 1980. The electrical engineering department had 120 students per year and in the third year they were split into 60 “electronics” and 60 “electrotechnical” (dams, electric trains, etc) engineers though only 10 wanted the latter due to the introduction of microcomputers. The problem was that more than half of the professors (mostly the older ones) were on the electrotechnical side.

I got stuck in the electrotechnical course. So I dropped out and worked on computer design until the professors retired and the course was restructured. I went back and graduated in “microelectronics” with students 6 years younger than me.

Very lucky! The last military government had messed up in the early 1980s and killed off the semiconductor industry in Brazil. In 1989 they decide to fix that by donating workstations and chip design software to the top universities. One year earlier and I would have had to design chips on paper.

But the damage was not easily reversible: my 1990 microelectronics class had 20 places but only 6 students. Only me and another guy actually chose this option - the other four had failed to get into the computer engineering course. With no local industry there was no interest in the subject and though my class was the first it was also one of the last. Attempts to fix this in the next decades have had poor results.

Damage or not, chip design never really had a market place.
Intel, Motorola,and the 6502 people covered 90% of the 8 and 16 bit stuff. I lump the Z80 with Intel because it was a upgraded 8080. Risc stuff only advanced with the advent of handheld and smart phones, and was because Apple stole the ideas. Things like FORTH cpu’s have a few sales, compared to 4 bit? micros in your TV, microwave, dishwasher. Ben.

And the radiation-hardened Harris/Intersil RTX2010 probably holds the distance record for microprocessors in space!

1 Like

I can’t see more than a dozen sold a year, mind you
they don’t sell at 99 cents each. Did not Intersil also
make the cmos PDP-8?

Yes I think that Harris and Intersil had a few joint ventures because they both made the 6100 PDP-8