Virtual Machines and One Page Computing

This approach to programming has resurfaced during this year of semi-retirement,spare time and reflection.

My interest in the mechanics of computing have recently been rekindled by the relative ease by which low cost, high speed microcontrollers can be used to simulate classic cpus and systems from past history, or to explore architectures of entirely novel designs.

Ed and his friends over on the forum introduced me to the idea of the “One Page Computer” - which appealed to my desire to keep things simple.

Yesterday I read the paper on the “The Cuneiform Tablets of 2015” - mentioned in a thread some months ago.

The premise of the paper is to successfully preserve our digital archive, for future the benefit of future generations. To leave a time capsule of our digital heritage for future archaeologists to unpack.

I liked the authors approach - stating that the virtual machine description should be able to fit on a one page document - which sounds familiar:

In summary, our “message to the future” should consist of two parts: a “one-page description” of a simple virtual machine, and a program that will run on the simple virtual machine.

Moreover - having acquired the one page document of the machine description, and a sample program, the computer archaeologists of the future should be able to recreate the machine in “a fun afternoon hack”.

Certainly, the series of OPC designs created by Ed, Hoglet and friends, proved that you could rapidly evolve and innovate through a range of different archtectures and instruction sets - implementing each iteration on an FPGA so that relative performance benchmarks could be performed.

The Cuneiform Tablet paper went on to describe a universal virtual machine called Chifir as the basis of their concept.


Some feedback on the discussion thread suggested that this VM might be all very well to recreate the Xerox Alto or Data General Nova, but somewhat sub-optimal for almost any other past or present architecture.

An examination of the instruction set shows that the operands are from memory addressed by registers A, B and C. All operands are taken from the 21 bit memory address space and are 32 bits wide.

The instructions have the full complement of arithmetic operations, but only logical AND, which seems to be a shortfall. Given only 15 instructions, there are perhaps a few that might be more useful.

All arithmetic, logical and comparison instructions are performed on M[B] and M[C] with the result being placed in M[A]. There does not appear to be a means to get a literal value into memory or to initialise the registers to a given value.

With the limitations of the instruction set, and the inefficiency of the architecture, I can’t help thinking that there might be a better approach.

There is a further flaw that I think the Cuneiform Tablet project has - which I will try to illustrate:

Suppose, for example, we assume the roles of the future technology archaeologists and we come across a digital time capsule from 1976. On the disk there is a description of a VM which we implement using our current technology, and a series of example programs that illustrate the capabilities of the original machine, written in the VM code.

In an “afternoon hack” we implement the VM on whatever hardware we happen to have lying around and then load the example programs and experience them in their full glory on a monochrome screen.

For this to work at all, someone back in 1976 would have had to simulate a complete Apple 1 on a virtual machine, transcoding the full 6502 instruction set, implement Apple 1 BASIC and the ascii image itself into a VM bitmap.

However, as future archaeologists we would learn nothing about the workings of 6502, or the Apple 1 or who the guy with the beard was. The archived message that had been put into the time capsule had become so diluted in context that it could easily be overlooked in its significance.

The authors recognised this limitation, and sum it up in this statement:

If the archaeologist of the future wants to modify the emulator or build a hardware replica, she will need to understand how the original platform and its emulator work. For this purpose, we include the source code of the original platform emulator and the compiler for our emulator-writing language.

That’s my take on the Cuneiform Tablet project, a lot of effort but does it really convey an accurate portrayal of our digital archive?


Real Cuneiform Tablet’s are stored as holograms.
The hardest thing about emulation, is to be bug compatable in many cases. As long as simh is around
I thnk virtual machines will do fine.

That reminds me of “The Cult of the Bound Variable: The 9th Annual ICFP Programming Contest” by International Conference on Functional Programming. Similar but reversed for a kind of computational archaeology contest.

… the Monroeville scrolls make reference to computing machines made of sandstone…
… Among the documents found intact in the Monroeville collection was a lengthy codex (a program), written in no known language …
… workers discovered a set of inscribed tablets that proved to be the Rosetta Stone for interpreting the Monroeville codex. The tablets precisely specify the Cult’s computing device, known to initiates as the “Universal Machine.”

The competitors needed to construct the simple (14 instructions) virtual machine in order to decrypt the codex, which yielded a program (a UNIX like operating system) that runs on their VM, and contained further programming puzzles to completed.

(The post contest technical report.)

1 Like

They tried to break it down into a series of simple steps. First there are plain English instructions on the front of the disk. They are very short and explain how the content is encoded in "1"s and "0"s that can actually be seen in a microscope.

The next step is to build a machine to read those characters since there are millions of them and it would not be practical to copy them by hand. The first few thousand bits encode a one page image that actually describes the virtual machine. This picture has the same resolution as the frame buffer of the VM but in a different format (one bit per pixel instead of one 32 bit word per pixel). Now you can program the VM, skip the bits of the documentation and load the rest into the VM’s memory.

It doesn’t directly manipulate 16 bit values so even the Alto implementation is a little awkward. The frame buffer has the same resolution but a different bitmap. I see how the keyboard is handled but not the mouse.

A, B and C are instruction fields, not registers. So addresses are actually 32 bits though only 21 are used. I don’t see any advantage in that.

You load the 1s and 0s after the one page description into the memory. That will include any literal values you need. Like I said, there are no registers except the PC.

The idea is that when you run the VM you don’t get only a usable emulation of the particular software you are interested in but a set of tools and more documentation that will allow you to explore the artifact and its context.

A very limited example of that is the Javascript emulation of the Alto Smalltalk-72:

You can pause it, examine the Alto/Nova registers and single step through the machine code and things like that.

Today I saw a list of systems that are simple enough to bootstrap:

I think it is interesting to organize a project for a future digital archeologist into steps that are simple enough but with rewarding results. If the person has a microscope that allows them to see actual 1 and 0 characters, why not have actual text they can read? I would make the instructions in the front spiral into smaller and smaller pages around the outer edge. That way you would need nothing to start out but would need better and better microscopes to get to the end. And there would be dozens of pages, not just one. That would allow slightly more complex, but less wasteful, VMs to be described.

Building a machine to read 1 and 0 characters might be a little complicated. A human readable alternative would be to use hexadecimal code with a 7 segment font. Such a font would be a little more complicated for a simple machine than some bar code, but not too complicated.

My two cents, based on experiences with the PDP-1 instruction set, which is still simple (less than 32 basic instructions), but extremely versatile:

Rather than having dedicated indirect addressing instructions, provide a universal indirect addressing bit, adding another lookup-cycle to the (virtual) memory buffer register.

This would also free two instruction codes that could be repurposed for other, useful building blocks, like shifts/rotates.
This concerns instructions:

5  M[A] <- M[M[B]]
6  M[M[B]] <- M[M[A]]

which could be easily replaced by a more general form

4  M[A] <- M[B + i]     // equiv. M[A] <- M[M[B]]
4  M[B + i] <- M[A]     // equiv. M[M[B]] <- M[M[A]]

etc. (This would also allow for object oriented approaches out of the box.)

Similarly, any special conditional instructions could be replaced by a more general conditional skip, e.g., “skip next instruction if M[A] zero” and “skip next instruction if M[A] negative”.

This could replace instructions

2   IF M[B] = 0, then PC <- M[A]
12  IF M[B] < M[C], then M[A] <- 1, else M[A] <- 0


5  skip if M[A] = 0
6  skip if M[A] < 0

(Here, the indirect addressing bit could be repurposed to negate the condition, just as on the PDP-1.)

P.S.: Here’s a reference to the PDP-1 instruction set (the various “groups” are microcoded instructions, where the operand provides the microcode),

The original TX-0 had a very minimal instruction set since it was only designed to test that the new core memories actually worked. With 64K words of 18 bits, there were two bits for the opcode and 16 for the address. Eventually the machine was donated to MIT but with only 4KW of memory while the rest went to other projects. The opcode was expanded to 4 bits and an index bit was added, leaving 13 bits for addresses in case they ever wanted to expand to 8KW.

The PDP-1 was a cleaned up version of the modified TX-0 and the PDP-8 was just an adaptation of the same design to 12 bits. Many consider that to be a “sweet spot” for this style of computing.

The “Ugly Duckling” computer that I have mentioned in another thread (the first practical computer designed in Brazil) adapted the idea yet again with 16 bit instructions, 8 bit ALU and 12 bit addresses.

Exactly! The TX-0 was always a bit batched up and relied heavily on microcoding, blurring the lines of actual instruction codes and microcodes over several iterations. (Also, some basic operations weren’t atomic, like addition, which required an extra step for the ripply carry.) The PDP-1 was the “cleaned-up” version, which took what worked and solidified on this. (However, some operations, esp. partial ones, were missing.) The PDP-8 is a bit too basic to my personal liking, while the reduction of the instruction set is quite ingenious.

I found the PDP-1 especially welcoming and friendly to program – adding a stack and maybe 24-bit operands would make it a next to ideal “toy computer” (© Martin Graetz).
Some of the things that made it this welcoming: universal indirect addressing (everything may be a pointer!), index operations, indexing including a conditional skip on reaching zero (basic building block for loops!), barrel shifter (up to 9 bit positions/half a word – extremely versatile, compare Sophie Wilson considering it a must for ARM).

PS, historical observation:
Like many other computers of the era, the PDP-1 had a basic assembler built into the instruction set to facilitate self-modification. There were 3 deposit instructions: DAC (deposit AC), DAP (deposit in address part only), and DIP (deposit in instruction part only). – I’ve never encountered a program that used DIP, nor would I have wished for it myself. This also includes programs for other computers with a similar setup. So, all you really needed was DAC and DAP, while DIP turned out to be quite a waste of resources. (Moreover, you could emulate it with minimal overhead by combining DAC & DAP operations.)

Some things are awkward when built from conventional chips but really elegant when you can get at the transistor level. A barrel shifter is one of those and was included in the processor. Another is content addressable memory (CAM), which is a large part of the memory controller chip (MEMC).

About being nice to program, the Data General Nova had its share of fans (the Xerox Alto emulated in its default instruction set). Not counting i/o you can think of it as having two instructions: load/store and a microcoded register to register thing. It included most of the feature you listed.

Alan Kay asked Chuck Thacker (one of the main creators of the Alto among many other things) to design a simple, but not tricky, computer that could be understood by high school students. He did a series of “tiny computer” designs. His version 3 is explicitly Nova-like while being much simpler. Except for the last version they were limited by really tiny instruction and data/register memories.


Another thing to consider: how many registers should you have for a minimum? The PDP-1 has two, AC and an auxiliary IO-register (you can shift between them seamlessly thanks to the barrel shifter). To all logic, this is really all you need, as long as you do not have a need for special index registers. Back in the '80s, when I first “met” the z80, coming from the 6502, I wondered why you would need all those internal registers: clearly AC, X and Y are all you need. Now, one of the things Spacewar! (on the PDP-1) is known for, is its “outline compiler” for drawing the spaceships (procedural sprites compiled on the fly from codes describing the outline), probably one of the first JIT compilers, there are. However, this wasn’t really about JIT code, but about registers: with just two registers available, looping over outline code, extracting slices of them, adding respective offsets and using the same two registers to transfer the resulting coordinates to the display, this involved many round-trips to memory for an interpreter, which just isn’t necessary for the compiled code. Notably, it’s not about the actual instructions for producing the outlines on the scope, but for juggling all the data in just two registers (for the compiled code, there isn’t any “juggling” at all). – I assume, with, say, four registers, you could have well got away in interpreted mode. If your instruction set is all about emulation, this may be worth considering.

By which we would arrive at an instruction layout like

opc | reg (0-3) | i-bit | operand/address
1 Like

Back then core memory was considered ‘registers’.
I just finshed a 36 bit design on paper. A Index reg
is used rather than indirect. 8 registers 4 data 4 index
and a hidden pc.
mode | size | carry | op | ac | ix | #
A simple design with a 2901 bit slice, But very complex
if done in the early 60’s with your register file done as fast
core memory.

PDP 6 price sheet 1965

PDP 6 CPU $151,000.00
fast memory (registers) $30,000.00
16K words slow core $85,000.00
The PDP 11 was lucky to have a 16x1 ram memory with the 1968-1969 design. The big iron of the 1960’s had big $$$ spent on said features.

Some of the PDPs hat fast and/or auto-incrementing registers at the very lowest memory addresses. However, this is still memory, meaning, you have to go via the memory buffer register and the bus, which is at least an extra cycle. With “real” registers, an instruction internal to the CPU typically finished in a single cycle. (While designs like the PDP-8 aimed at reducing the number of those “real” register, as these added significantly to the cost, they come virtually for free on a virtual machine.)

I once looked up what the “affordable” PDP-1 was in real money (as opposed to inflation corrected value):

Basic system incl. display: ~ USD 135,000

Main unit … 120,000
Display … 14,300
Add. block 4K memory (Type 12) … 30,000
Memory controller for Type 12 … 10,000

Prices in the USA, 1960:

Average car … $ 2,400
Average home … $ 12,000

So a basic system was 11 new family homes and a brand new car!
Adding 4K was still 3 new homes and a brand new car for each of them.

1 Like