Bootstrappable (Retro) OS?

Prompted by the recent Pi Pico thread which talked about FUZEix, then Unix, lack of C compilers, etc. I thought I’d just pop this here - for thought… Feel free to expand on it, or go off at a tangent as you need!

I have a little retro 65c816 SBC with 256KB of RAM which I’m writing a little OS for. It’s nothing special right now, just a few retro style things on modern (if period) hardware. There are 2 levels of OS; The native one is written in assembler and provides an Acorn MOS like interface to the hardware with a simple CLI. The higher level OS is written in BCPL and there is a small assembly shim to launch the BCPL system which uses the underlying OS to load from storage. My early thoughts were to re-imagine an OS on hardware from the early 80’s although my initial aim was older style hardware - the '816 was about in 1984 but we did have the 68K and 8086 by then and I used a fine multi-tasking command-liney OS on a 32-bit ‘mini’ with 256KB of core in the late 70’s that supported it’s own compiler…

(IMy OS is almost a 32-bit version of the BCPL that ran on the BBC Micro in the early 80’s)

While I did initially cross compile the OS, CLI & utilities on my Linux desktop it is capable of running the current BCPL compiler natively. It can’t yet compile the compiler (more due to an issue I have with floating point than anything else) Compiling isn’t fast though - my nano-like editor will compile in the blink of an eye on my desktop, but takes 25 seconds on the SBC…

But which direction to take the OS in? The commands are Unix like and my aim is to make a form of redirection and pipes and muti-tasking.

But then there already exists Fuzix… So why bother?

Or do I just plod on - it’s a nice little hobby, if nothing else - and do what I fancy? The issue is, is that just about everything resembles Unix these days (and I think that was a quote from Ken Thompson, or someone in that group of peers)

Thoughts or ideas welcome…




Comparing speeds one must compare DISC and Console I/O as well as raw computing power and compute relaitive speeds. I still type the same speed, on desktop or aCP/M machine. HUNT H HUNT E PECK L HUNT L PECK K ^H L PECK O ^ENTER HELLLO I have 10 year old machine, and 5 to 20 seconds for menu window to popup, thus I suspect DOS while slower (~ 1990) with like 10 minute compile times
gave a better reponse time than today on bloated OS like windows 10.
CP/M z80 or DOS xx86 being single being single user while slow
still I suspect faster than Unix running a 50 users at one time.

As side note was the 68000 the only common 16 bit CPU with true* 32 bit addressing
in the late 1970’s and early 1980’s? * no segments or banked memory.

1 Like

Maybe take a look at this : GNO/ME - UNIX for the Apple IIgs
multi-tasking on an Apple IIgs.

Are you taking advantage of the '816’s 16 bit machine stack pointer and direct page anywhere in bank 0? Is your OS loaded and running in other than bank 0?

As is you have a form of multi-processing going on so that’s a plus.


1 Like

It’s great to see a very-nearly-bootstrappable retro OS, and indeed to see an independent line of investigation.

That’s absolutely fine! Build times of 20 mins are not uncommon at places I’ve worked - and that’s after optimising them.

I wouldn’t take that line myself! Diversity, amusement, learning - all sorts of reasons for following your own story. Indeed, you might have a cleaner codebase, having less history.


Hi Andy,

Yes, I’m fully aware of the Apple //gs and it’s software - at the time they were stupidly expensive in the UK, so just could not get one - also by that time, I was into other things, so while I’ve admired it from a distance, never really gotten one for real hands-on use.

However, No. I’m not using a 16-bit stack pointer. I’m not using any stack pointer which may sound somewhat odd, but…

To run BCPL there are a few ways to do it - one is to write your own compiler that targets the local hardware. I have not done this.

Another is to take an existing (the existing) compiler and ask it to output a machine independant code (called SIAL) which you then translate into native code. I have not done this.

The 3rd way is to use the existing compiler to generate what Martin Richards considers the “ideal” machine code for BCPL (or at least a fairly close resemblance to what it might be), called CINTCODE. This is a bytecode. The opcodes are all 1 byte long and operands are from 0 to 4 bytes long. Then you (me) write an interpreter for this CINCODE. This is what I have done.

I’m not using the '816 stack because I don’t do any JSRs, nor do I store BCPL data on the machine stack - mostly because CINTCODE runs its own stack inside it’s virtual machine.

Another issue I face is the '816’s banked memory system - I want to be able to declare an array that’s > 64KB long and have my BCPL program iterate over it without them even knowing there is a banked memory system underneath.

So it’s not been easy - however I am actually pleasantly surprised by the overall speed of it. For what it’s doing, it’s not slow. Yes, it takes time to compile programs, but it’s not slower than some systems (CP/M, BBC Micro) that I used back in the early 80’s. It actually feels better than the universitys overloaded PR1ME system that they had back then too…

A few other details - all natively executable code lives in Bank 0. It might be possible to run '816 code in another bank, but I’ve not made provision for it - mostly because I feel there is no need for this system. The BCPL system is 32-bit, so the stack and globals registers are also 32-bit, however I keep the BCPL stack(s) and Globals vector(s) in Bank 0, so I get away with just 16-bit pointer for these. That’s an optimisation that imposes a limitation on stack size, but I can live with that.

The multi-processing - it works like the Transputer. When a CINTCODE jump instruction happens, a check is made to a flag that’s set from an interrupt routine and if the flag is set then a thread switch happens. It’s not as nice as a Transputers because I have to save/restore registers which you don’t need to do in the Transputer but that’s because the compilers are told that registers will not be preserved over a jump instruction…

I also have a slight advantage over e.g. the //gs in that I don’t have native graphics, so I’m not losing RAM for video buffering - the graphics currently run over a serial line to a “smart” terminal which emulates a superset of the BBC Micro VDU commands…

I am happy with it though - it is both frustrating and exciting - I’ve had uptimes of many days which has included editing, compiling and running little test programs on it, although any multi-tasking system without memory protection is going to have a degree of fragility about it…



1 Like

I’m thinking that the NS32016 might have been earlier, but a quick look at Wikipedia on it suggests that it only had a 24-bit program counter - however at that time it was probably considered more than good enough.

It was also quite buggy and never did perform well.


Hi Gordon,

I have some familiarity with byte-code VMs via HyperC which is not BCPL and a variant of K&R C. While it was developed for the 6502 I modified it for an installed 65802 and it eventually worked with first a software stack/ZP and then a 16 bit hardware stack/ZP and the software tools worked the same being byte code. Code was denser and faster (except for switching to and from emulation mode for 8 bit firmware).

Can you provide a link to a decent description of CINTCODE and the VM that BCPL uses? It would help in my understandling.


It’s all here:



Are you looking to run CINTPOS on your Ruby816?

No. I’m quite happy to plod along and do my own thing here - for now, anyway.

(If you want to chat more do feel free to email if it’s not appropriate here)



The original 68000 also didn’t have an actual physical 32-bit address space and could not address all 32 bits of memory — the registers, however, were 32 bits wide. Many early 68k operating systems “took advantage” of this fact to use tag and other bits in the pointer addresses, which the machine would ignore. This became a problem later, when the full address space became available!

I have to imagine that, had it hung around long enough, the NS32k would have eventually grown 32 full bits as well.

Oh, it did! From the 32332, in 86 or so:

The NS32332 CPU was the second generation of the Series 32000 family and appeared around 1986. This chip had for the first time in the Series 32000 family a full 32-bit address path. This enables access to 4 Gbytes which was quite a lot for the middle of the 1980’s.

1 Like

I seem to remember than in my uni in the 1990s there was a small group of hackers who ordered some boards with the 32-bit NS32k in them, because the Mach research operating system supported that chip. Some of my colleagues participated, I was too busy trying to get some studying done.


So why did better O/S’s and CPU’s get forgotton, leaving the Windows and
the 386?

1 Like

It’s a very good question - along the same lines as why VHS rather than Betamax…

My guess is that the answer is most probably “No one got fired for buying IBM” - and by default IBM hardware ran MS DOS and by the time of the clone wars and IBM no-longer dominated the “PC” hardware arena, MS DOS and Windows were pretty well established…

Meanwhile I was using Unix in various guises (Sun, HP and then Linux) but there are just too many Unixes while there is only one “Windows”.

Want user-friendly Unix? buy a Tesla… (Linux)


PDP 11 unix is the ONLY true unix in my view. Going to the VAX or some other HOST
unix II might be better naming do the switch from 16 bit pointers to 32 bit pointers…
I was thinking more of GEM, CP/M 2? 3?
Pascal even Microsoft’s unix what ever they called it.
Like VHS vs BETA , cheap was another factor I expect as well.

I checked with one of those guys and it turns out I sort of remembered the neighborhood but not the house: NS yes, Mach yes, but the CPU was “homebrew”, PC532: PC532 - Wikipedia - NS32532 CPU and NS32381 FPU. Apparently NS was known for being buggy as heck, but the NS32532 was surprisingly fine. And the Mach part is that these guys apparently did the porting, not that it already supported the chip.

I think this might be a tangent, but “bootstrappable” has a website all of its own, as an idea, to bootstrap a full OS from high level source with the minimum of machine-specific binary support: even to bootstrap at boot time, which is very ambitious. There are many very interesting links down that rabbit hole, including discussion of the “Trusting Trust” attack, and the 1974 review of Multics security and the 30-year retrospective of it.

And, in a sort of intersection of ideas, I see MicroPython can be run on the Pi Pico, and on other microcontrollers, and might be in scope for a 512k '816 machine (or possibly a smaller one.)


I think for bootstraping one often forgets one needs a subset of the language and that is often not found as the first
selfcompile code is often lost. Hindsight shows that simple array indexing and structures are needed not fancy things like * ++/-- or switch statements. This makes compilers like small c useless for any major bootstrapping.
C’s macro defines # also often it make inpossible just know what is being compiled,
Also the memory model of compilers have changed over the years, in that todays compliers assume you have 2^63 bytes of memory free to use. At one time you had 2^15 bytes of free memory
in the early 1980’s and 2^12 before that.
A C compiler to generate a simple C subset as output might be a good first step.(This may be spilt into two parts,
A:macro and syntax/error checking, B: token processing ). A second compiler to generate
BCPL like code and size adjustments for 8/16/32/64 bit processors and byte/word addresses. I am not fond of the BCPL syntax. A third compiles to BCPL.
Then you also have have the problem of hardware like PALS,FPGA’s and PROMS not having portable software and or hardware bits. I want bootstrapable complete.
I just upgraded to a new machine, and discoverd I can’t get the FREE web Quartus 9.1for the Cyclone II any more.
So I have keep the old around machine around for the Altera Bit blaster and USB to serial dongle.
Rather than saying you have 512k memory it better to say I have data/program segment with 16 bit offsets.
Machine code could be limited by a simple model, RISC n or CSIC n. where n is model type. 0 might have basic
primitives, 1 bytes, 2 multiply , divide 3 complex or large addressing modes.
Back to 1980’s., remember most things were compiled on a main frame, that had ample file space and 32 bit integers.not self compilng on the host machine with 360Kb floppies and 32KB of memory and a TV video display
16x32 lines. (dragon/coco II).

The two questions I ask myself when it comes to projects are 1) Will I learn something? Is there something interesting in it which I haven’t really spent a lot of time with before and 2) Will it help me do something? Does it solve a problem I have or make it easier to do something I do often.

I don’t think the existence of another system should ever stop you from doing you’re own thing. It’s more about whether or not there’s still something you can gain by continuing.