A loader loads a loader that loads a loader... (Interdata mainframes) paper tape bootstrap

I’m still interested in an EMAS emulation, but there’s still no ICL 2980 emulator, at least some 1900 emulators, and the MOUSES emulator for Interdata (SIMH). Although Interdata and ICL have different hardware it’s interesting for comparison.

The Interdata 70 user’s manual has many details. Very interesting are the loader descriptions (chapter 7.9)

There are 6 different binary object paper tape formats (some more for other than paper tape)
M08 Relocatable zoned loader format
M09 absolute zoned loader
M10 Bootstrap (self-loading)
M14 Eight-bit core image
M16 Relocatable non-zoned loader
M17 absolute non-zoned loader

Interesting is the 50 Sequence Bootrap loader, starting at X’50’ also having a device definition table. The booting is as follows (3 segments on the same paper tape)

  1. The eight-bit Loader at 50 reads another loader into 80 to XC and transfers to 80
  2. The program at 80 reads … which includes a Fast Format Loader
  3. (checksum test)
  4. The Fast format Loader loads a special REL Boot Loader …
  5. … REL Loader placed 400 from memory top, General Loader 600 from top
  6. The REL Boot Loader reads the Loader program, … relocation
  7. (checksum)
  8. finished reading

It’s splitted, memory locations 80-2AA and DF0-FF6 are used.
Later, depending on the OS, there are some more loaders, including an OS Library Loader, Stand-Alone Loaders and BOSS resident Loader.

OMG why so many loaders? And why starting the booting at 50?
Some memory contents below 50 (Dedicated low core) are mentioned (pointers and PSW) but only starting at 22. I haven’t found what is stored below.
The Model 74 had an automatic loader.

Another bitsavers document has a commented bootstrap loader (29-376 32bitLoader).
The MOUSES and EMAS sources have some binaries within the IMP files like the GPC microprogram in gdc03s.imp (chopsupe).
Some have different versions running at different memory locations. So without detailed hardware knowledge (and programming skills) and enough time to read all sources it’s very hard to impossible to write an emulator. The sources are still interesting.

2 Likes

I’m fond of multi-stage bootloaders, for some reason. I think sometime the multi-stage approach is used because at each stage there are some restrictions: perhaps as to length, or location. For integrity you’d want checksums, but there might not be room for that code until a slightly later stage.

Of course another possible reason is simple history: that’s the order that things were written in, and at first very simple facilities were enough, and each time there’s an idea for something more sophisticated, that’s another stage to the bootloader.

1 Like

Paper tape tended to be the I/O medium for smaller systems. Punched cards for mid sized.
Magnetic media for biggest systems.Things became more intersting when linking loaders
were invented, but no operating systems around. This is where emulators are useful in showing
how the very early computers ran, and where computers expected data was to be recived
and sent and what format data input was punched in.
Ben.

I don’t know the specific answer to this question because I don’t know the architecture, but in general:

It is common for the first-stage loader to load near location 0, and it is common for a system with limited RAM to locate that RAM starting near location 0. This allows the first stage loader to set requirements and assertions such as “you must have at least 128 bytes of RAM to load, but if you have at least 128 bytes of RAM you can load a stage 2 loader as long as it fits in your RAM”.

It is also common for the first stage loader to be dead simple and only a few instructions long, because (in particular for minis and very early micros) it might have to be hand-entered by an operator using panel switches every time the machine is booted. This often necessitates (depending on platform) that it is not relocatable; again, leading to a specific address requirement, frequently at a very low address.

The second stage loader is frequently several times larger than the first stage loader, because it is expected to be loaded from tape or disk, and it may or may not be relocatable; later stage loaders after the second are frequently relocatable, and load to a location that is convenient for the OS being subsequently loaded, possibly after the available memory size has been probed. Some loaders are even capable of relocating themselves after the load location for the next stage has been identified.

Two to three stages of loader is what I would consider normal, from the time of the early minis to the modern PC. A typical ROM-less mini load would often look something like:

  1. Key in a first stage paper tape loader of 10-20 words at the front panel, capable of reading a binary image off paper tape into a fixed location in RAM
  2. Feed in a paper tape of a second stage loader at a convenient Teletype or dedicated paper tape reader that knows how to read a disk block or three into RAM
  3. Load a third stage from disk, which locates the OS itself and loads that

On some particularly expressive machines with high-level peripherals (such as the PDP-11), stage 1 might load a second stage directly from disk that then loads the OS itself. On machines with a ROM bootstrap, stage 1 might be in ROM, which then loads from disk, which loads the OS, leading us to something dangerously close to a modern boot process.

On a modern PC, the BIOS in ROM loads a small “first stage” loader from disk (512 bytes for a traditional PC BIOS, though it can be significantly larger under EFI), which then typically loads a second stage capable of presenting the user with a relatively sophisticated user interface; this second stage loads the operating system itself. This is “three stages”, with the BIOS being a kind of “stage zero”.

The essential heart of all of this complexity is that, when the computer powers up, you don’t know anything about it in the general case. You don’t know what peripherals it might have, how much memory it has, how fast it is, etc. The first stage loader does just enough to be able to boot something slightly more complicated on an arbitrarily simple configuration, which does a little bit more on a slightly more complicated configuration, and then repeats until a piece of software of sufficient complexity to load an operating system has been established. For a modern (where modern is “since the early 1970s” system, this typically involves parsing some sort of filesystem off a disk device, including filenames and paths.

I used to wonder about all those loaders, esp. on minis, myself. But it becomes somewhat logical as I investigated the architecture of the PDP-1. Here, we have a special RIM (Read In Memory) mode, reading in arbitrary addresses followed by deposit commands from paper tape, until a jump instruction is encountered. This is quite cheap to implement in hardware (as it repurposes existing circuitry) and is general purpose. But it is also slow. So the next step was to create a binary loader. In this case, it’s a rather short program residing at the very top of the standard memory and, since we’re dealing with core memory, it is resident, once loaded by RIM mode. The binary loader reads arbitrary-length blocks of sequential values with check sums from paper tape and jumps at a given location at the end, which is both faster and safer (as we have check sums). The format of this loader is the output of the Macro assembler, and this goes back to the TX-0 assembler (and maybe to Whirlwind before this, but I don’t know any specifics about this.)

Generally, hardware RIM mode fell out of favor soon and was replaced by simple toggle-in sequences with the machines that follows. But at this point the scheme was already established and the binary loader was also part of the output of the Macro assembler. (So a tape would start with the binary loader in RIM format and execute, when encountering the end of the RIM sequence and load the rest by this.) The important takeaway is that the format of binary loaders was a result of the development tools used. (E.g., the block format was just a consequence of using in-memory assembly and the restricted amount of working memory available.) There was no idea of standardization and nothing suggested that another tool should use the same format.
Once.schemes like these had been established, if you wanted to add some new capability, like relocatable routines (requiring some modification of the code with respect to a base address, while loading), it was quite logical to have a new loader that would go side by side with the established scheme. So things started to pile up.

Low address where often special address. E.g., the PDP-1 uses addresses 0-3 to dump its registers on a sequence break (interrupt), the Univac uses address 0 as a branch vector on illegal divisions, other machines, like the PDP-8, have self-incrementing registers at the very bottom of memory or share these addresses in bank switching schemes. So it probably became a natural instinct to stay away from these. Conversely, if you were designing a new architecture, low addresses became the natural location for any apecial purpose registers.
(Also, some things hint at that low addresses were generally expected to be used for scratch memory. E.g., the PDP-1 (well, again) has a special jump instruction to go to address 0100, which hints at this being expected as a general entry point. However, there are no special addresses beyond the first 3 and so 4 became the general start address. So, if there was an idea of using this space for scratch memory, this would have been another incentive to stay away from these locations with anything that you may want to keep resident in core memory. – A distant echo of this may be the zero page of the 6502 and the jump vectors at the very top end of memory.)

I guess, many of these things carried over from platform to platform, especially, when software tools (and their associated memory schemes) were ported, and eventually accumulated.
(Mind that many essential tools, like ed, where originally developed on the PDP-1, so it would have been small wonder, if ports reused some of the memory schemes. The same is true for assemblers, which were often ported from other platforms and adjusted for the given instruction set.)

Complicated bootstrap is alive and well in current retro designs. One such example is ZRCC which is a minimal CP/M-ready Z80 SBC without dedicated flash or ROM but boot off the master boot block of its CF disk. Complexity arise when starting off with a blank CF disk, then there is a complicated bootstrap process to install necessary software in the new CF disk. It begins with a small (64 bytes) dual-boot ROM in CPLD; first it boots in serial mode that loads and runs 256-byte program received via serial port which, in turn, serially loads a monitor that formats the blank CF disk, copies CF bootstrap into CF’s master boot block, copies CP/M binary in reserved sectors in CF, then loads XMODEM image in RAM, boots CP/M, saves XMODEM image as first file in the CF disk, then uses XMODEM to load in rest of CP/M files. The whole process can be automated via TeraTerm script and install CP/M to a blank CF disk in about a minute.
https://www.retrobrewcomputers.org/doku.php?id=builderpages:plasmo:zrcc:zrcc1:cf_installation
Bill

1 Like

That’s splendid! We’ve discussed this kind of thing on the 6502 forum before now - one possibility being diode ROMs. As in your case, you don’t need a great deal of ROM to get started - and it might be less hardware than a workable front panel.

Thanks for all your answers. I heard of the PDP-1 RIM loaders before.

I first thought, that the loaders are to boot the Interdata or an OS, but I think I’ve read they are just to load a regular software from paper tape. Not sure, how many software tapes had this bootstrap autoload format. Other paper tape formats had less loaders.
Later, if they used magnetic tapes, drums etc or when a computer is running 24/7 it was probably easier.

Not often how ever, take a look at booting a PDP-7, or later model like the PDP-15.
http://simh.trailing-edge.com/papers.html
“Unearthing the PDP-15 Operating Systems”, by Bob Supnik.

If interested, a more detailed description is found here: https://masswerk.at/spacewar/inside/insidespacewar-spacewar2bsa5.html#loader

On bootstrapping a new OS… This extract from an article on Mouses has always amused me:

PSR vividly remembers working out how to start bootstrapping the system. The first, primitive supervisor was written as a single, stand-alone IMP program. CW’s framework would load this and start it running under OS/32. The problem now was to gain full control of the machine, as the fledgling system was running in a protected virtual address space. Trying to use OS/32 to load a real-mode program seemed like advanced masochism, so we didn’t bother. The solution was to use the floating-point registers (another idea taken from EMAS). The 7/32 had firmware floating point and the floating-point registers were just the first 16 32-bit words of real memory. So the system loaded a tiny bootstrap into these registers and went into an infinite loop. Manually stopping the processor and restarting it at location 0 executed the code in the floating-point registers (sorry, bootstrap) in unprotected, real mode. The bootstrap mapped the address of the user code from virtual to real, by reading the segment registers, and then jumped there. The supervisor, now in real mode and privileged, relocated itself down to real address 0 and we were off.

Today we’d probably call this a “root exploit” or something similar but the net effect seems to be to get a program running in an existing “protected” operating system to take over complete control of the hardware. I bet the designers (of the Interdata) never expected that one…

-Gordon

2 Likes

Marvellous - worth a read - see here: “Nibble a happy warthog”

2 Likes

Relevant to me as I was one of the schoolchildren mentioned in that document who used to visit Moray house to use Mouses back then… I think I learned more about computing, structured programming and general computing “stuff” in those brief evening hours than I at school or university…

-Gordon

2 Likes

The golden age of programming, before the IBM PC. I never did get to study computer
science or use a large computer like the VAX, just the hardware side like the PDP8.
All you retired IMP progammers, I have this home brew cpu that needs a compiler. :slight_smile:
Or a good set of links, for the important docs and BOOT STRAP code.
Ben.
PS: Can I get a version of IMP that exchanges percent % and full stop . That is
my only gripe with it.

I still think BCPL might be a way to actually get going…

Here is a little story - sort of relevant to the loader that loads a loader that …

When I developed my 65c816 SBC I had ideas of porting something like a Pascal compiler to it, but having experience of BCPL on my previous 65c02 SBC and also some (almost) 40 years ago I opted for BCPL.

It wasn’t without it’s issues, however eventually it worked well.

One of the issues, “The Bootstrap Paradox”, I describe here:

https://projects.drogon.net/ruby816-bcpl/

The BCPL I chose compiles to a bytecode called Cintcode, so I started by writing the 32-bit Cintcode interpreter/VM in '816 assembly. It’s currently 16KB in size, mostly because of the inefficiencies of the '816 and also to try to make it as fast as possible there are lots and lots of macros that get expanded out.

Part of the inefficiency is the 64K banks of RAM that’s part of the '816 architecture, so these need to be worked around.

So that was good, and after I solved the above ‘paradox’ I boot the '816 from SD into a native operating system, use that to load the Cintcode interpreter which then loads the BCPL run-time and finally the CLI.

(Technically, there is an on-board ATmega MCU that boots from internal flash that then pokes a 1st stage bootstrap into the '816 then lets the '816 run, that 1st stage then requests the 2nd stage from the ATmega including the native CLI)

Fast forward a bit and what I really wanted was some native, but retro style 32-bit system to move my BCPL OS onto… I chose RISC-V, which, while not exactly retro does have its roots back in the early 1980s (but that’s not important here).

So the next step, as I didn’t have RISC-V hardware was to write an emulator for it… In BCPL. Maybe you can guess where this is going…

Using my emulator written in BCPL, compiled to Cintcode interpreted by my '816 assembly program on my 16Mhz hardware I was able to (cross) compile and run small C programs on it at a rate of about 2000 RV instructions/sec.

The next step was to write a Cintcode interpreter in RISC-V assembler - which I did and it comes out at about 12.8KB of 32-bit RISC-V instructions. (Again, lots of macros, etc.)

Then I used this to bootstrap my existing BCPL OS in binary form.

So I now have an '816 CPU at 16Mhz running a Cintcode interpreter running compiled BCPL operating system, running a BCPL program emulating a RISC-V CPU running a Cintcode interpreter running compiled BCPL operating system…

(Turtles all the way at this point!)

It’s not fast, but it works and when I did get real RISC-V hardware I was able to transplant the whole thing with very little effort. (Where it now runs blazingly fast!)

So back to Ben’s issue - All you have to do is

(a) write a nice macro assembler for your CPU,
(b) use that to write the BCPL Cintcode VM,
(c) use that to load a rudimentary BCPL program…
(d) …
(e) profit!

The alternative is writing a back-end for another compiler - be that C, Imp-77, Pascal or whatever. I think going down the BCPL route might be easier to get something working in the first instance.

Cheers,

-Gordon

2 Likes

Computer Science is all wrong, you need MOUSES and TURTLES to boot strap
with.
Byte code is not something I can write with the hardware I have, since short and
long data must be word alined. BCPL is nice but like [ ] ’ s
for indexing and structure data types.
I do have a hacked together assembler and a C subset like compiler.
No ++ – op= and structures. I was planning to boot strap with tiny C, but that
will not work with this hardware. Bare bones firmware in bootstrap rom (6KB)
like strcmp, mult,divide, console I/O, SD card block read/write. Later is
planned floating point.
The end goal here is software and hardware like the PDP-8/e or the VT78.
2 RK05’s, serial port, glass TTY, assembler and floating point package and 1 compiled language
and BIG FRONT PANEL with LIGHTS and switches. SD cards (standard) make nice RK05 disks,
and cheap $1 each.About 64K+ rams for programs and 32K ram for the OS, all using S100
sized cards,bus and box.
Bootsrapping is simple on this machine since I have a large rom and a non banked address
space up to 512k. A 1200 baud bootstrap loader is used for testing, as well as front panel for paper tape bootstrap and disk bootstrap.
Bootstrapping the FPGA, just needs the serial prom burned, as video memory and roms all use
internal memory.
Ben.

BCPL used to provide pack and unpack functions, but even so, you can get round this in your bytecode interpreter - it’s just a multi-stage operation, so load the word from RAM, shift the byte to the right offset, OR it it, then write it back. Similar for reading a byte. I presume you are doing this already for your C backend…

I have to do similar on the '816 as it’s stupid 16-bit mode can only read/write 16-b it data to/from RAM, and while I can switch down to 8-bit mode it add more cycles than doing the 16-bit fetch then AND…

+1 for the big panel with lights and switches though :slight_smile:

-Gordon

1 Like

Coming to this late but you set BCPL up so that the pack or unpack operations suit your machine. If you have a 16bit character that’s fine. BCPL is happy.

Another useful thing with BCPL and 8bit machines in particular is to turn the word machine cintcode into something more easy to execute. Actual BCPL compilers replaced the back end proper with a code generator so you got the parse tree but just mashing cintcode a bit cam help a lot with speed.

This last version of my cpu has, 8 bit unsigned bytes, 16 bit signed short and
48 bit longs with 48 bit decimal math (excess3), and a 16 bit opcode,and a 16 bit bus.
BCPL was developed on IBM 360, thus packing 4 bytes per 32 bit word, I pack 6 bytes
per word. Portable seems to imply, portable only with the same word length, and addressing
modes. Once I get a BCD floating point package done, then I will move to BCPL or a ALGOL
like language.

BCPL works on all sorts of virtual machine word sizes. You just edit the top of blib.b and adjust the byte specific stuff for your platform. In the Fuzix case because it’s mostly 8/16bit machines I run the system with 16bit words and 2 characters per word for packing. This gets you a working BCPL system with 16bit word for example.

LET WRITES(S) BE FOR I = 1 TO GETBYTE(S, 0) DO WRCH(GETBYTE(S, I))

AND UNPACKSTRING(S, V) BE
FOR I = 0 TO GETBYTE(S, 0) DO V!I := GETBYTE(S, I)

AND PACKSTRING(V, S) = VALOF
$( LET N = V!0 & 255
LET I = N/2
FOR P = 0 TO N DO PUTBYTE(S, P, V!P)
SWITCHON N&1 INTO
$(
CASE 0: PUTBYTE(S, N+1, 0)
CASE 1: $)
RESULTIS I $)

// THE DEFINITIONS THAT FOLLOW ARE MACHINE INDEPENDENT

(GETBYTE/PUTBYTE are virtual machine ops)

1 Like