The Texas Instruments TMS 9900 microprocessor

The TMS 9900 processor has been mentioned once or twice or even more times on this forum. It is most well-known for appearing in the ill-fated TI 99/4 and 99/4A line of computers, which arrived a little too late and at a little too steep a price tag to compete in the market, and had a rather unusual and unfortunate architectural design, but nevertheless sold by the millions (and is therefore readily available on the second hand market for very reasonable prices). However, this machine arguably does not do the TMS 9900 justice, came along rather late in the TMS 9900’s life, and doesn’t really represent what it is or what it is capable of.

The TMS 9900 was introduced in 1976 with a 16-bit word at a clock speed of 3 MHz, making it a reasonably good performer and beating most of the first crop of 16-bit processors (with the notable exception of the DEC LSI-11) to market by a couple of years. The 8086 would follow in 1978 and the Motorola 68000 in 1979, in particular. However, unlike these other machines, the TMS 9900 was limited to a 16-bit (really, 15-bit; more on this later) external address bus and, like the 68000, had no built-in memory management. This limited it to 64 kB of addressable RAM.

Architecturally, the TMS 9900 is a true 16-bit design in the CISC tradition with an expressive set of addressing modes a la the PDP-11 (including an auto-increment, but no auto-decrement, and without the double-indirect modes of the PDP-11). In fact, if you squint a little bit at the instruction set, it kind of looks like the PDP-11 instruction set reimagined as a hexadecimal, rather than octal, system — there are four bits of registers (R0 through R15), rather than three, and two bits of addressing mode, rather than three. This is probably unsurprising, as the TMS 9900 has its roots in a minicomputer contemporary with the PDP-11, the TI-990. It has 16-bit registers and 16-bit external address and data buses. The instruction set includes hardware multiply and divide routines as well as an unusually robust array of bitwise operations. It does not have either floating point operations or wide register support for storing floating point representations.


(TI-990 operator’s panel, photograph by Davepitts on English Wikipedia.)

The mainframe and minicomputer world has enjoyed a great diversity of architectural decisions compared to the microcomputer market, and the TMS 9900 architecture diverges from other, familiar microprocessors rather rapidly due to its minicomputer provenance. For example, the TMS 9900 CPU itself has only three user-visible registers: the program counter, a “workspace pointer”, and a status register. There are no general-purpose registers, no accumulator, no index registers, etc.! Instead, the workspace pointer (WP) contains a memory address that represents the location of the general purpose register file. R0 is located at the address stored in the WP, R1 is at WP + 1 word, etc.. The architecture provides sixteen general purpose registers, which are located in main memory and can be placed anywhere in the address space. The processor documentation plays this up as a boon for multitasking, as the entire processor context can be changed as easily as easily as loading a new workspace pointer. Indeed, a pair of instructions is provided to simultaneously jump to a new location (or return from an old location) and update the processor’s workspace pointer.

The disadvantage of this register storage in main system memory is that many instructions require a comparatively large number of memory accesses compared to other register-rich processors, and in this respect it behaves somewhat more like an accumulator-based architecture. The simple register increment instruction INC, for example, requires three memory accesses: instruction fetch, register fetch, and register store. As expected, most instructions take a comparatively large number of clock cycles to complete due to this and other architectural decisions; the fastest instructions (such as branch unconditional and many of the single-operand instructions) take 8-10 clock cycles, or longer if more complicated addressing modes are in use. The slowest non-division instructions can take well over 60 clock cycles, and division may require as many as 124 clock cycles plus overhead for addressing modes!

Another oddity for a microprocessor is that it is a true word-addressed machine. Whereas most early 16-bit microprocessors (and indeed many minicomputers, such as the PDP-11) could directly address memory on a byte level, the TMS 9900 cannot. It has only a 15-bit external address bus, and unlike other true 16-bit processors like the LSI-11 or Motorola 68000, it has no external signal to indicate byte operations or to communicate even or odd addresses; all address accesses are to even addresses.

This is probably due in part to the next interesting departure from other architectures of our time: the TMS 9900 neither has an I/O space on its address bus to allow I/O transfers over the same data lines as memory (like the 8086) nor uses exclusively memory-mapped I/O (like the LSI-11). Instead, it has a strange (by modern standards) serial I/O bus in addition to the parallel memory bus. This address bus is bit-addressed, and exposes 4096 bits of I/O space which can be transferred in 1- to 16-bit fields of any included size. Since I/O can be performed on any arbitrary bit boundary, memory bus addresses have no need of bytes! An internal “Communications Register Unit” (CRU) manages this serial communication, and TI marketed a family of CRU I/O devices for the 9900 family of processors, including the TMS 9901 parallel I/O and interrupt controller, the TMS 9902 asynchronous serial communication interface, and the TMS 9903 synchronous serial communication interface.

The processor documentation spends some time discussing the CRU and its usage for I/O, including providing several helpful diagrams for creating CRU-attached I/O interfaces out of discrete logic, such as this multi-bit parallel I/O device:

cruio

An oddity of the documentation for this system, and indeed for many of TI’s designs, is that the bits are numbered “backward” from most other systems. A0 and D0 are the most-significant address and data bits, respectively, while A14 and D15 are the least-significant bits. This poses a real hazard for the system designer, as frequently pin names will mismatch between chips that are being interfaced from other manufacturers, with D0 on the TI connecting to D15 on an external device! This becomes even more confusing when the serial communication documentation for CRU devices comes into the mix, particularly if one is used to more traditionally numbered systems.

The chip’s minicomputer roots and its designers’ vision for multitasking also comes into focus in its support for interrupts and system traps. It exposes 16 hardware interrupts and 16 software traps, each associated with a code entry point and a workspace pointer to store context. Note that where another machine might associate an interrupt with a code location and a stack pointer or processor status word, the 9900’s memory-backed register architecture allows it to associate it with a fresh set of registers!

The electrical care and feeding of the original 9900 is simultaneously complicated and difficult, and pleasant and straightforward, depending on which part of the design you are looking at. It uses a large 64-pin package, which means that no bus multiplexing is necessary, and the address and data bus lines are neatly arranged in sequential order down opposite sides of the chip, with none of the scrambled pin layouts that often plague smaller packages. However, being an NMOS package with a complicated external clock, it requires three operating voltages (5 V logic as well as low-current -5 V for the NMOS substrate bias 12 V for (presumably) clocking circuits) and a four-phase non-overlapping clock 12 V nominal clock signal with 75 mA of drive current. TI helpfully provides a companion clock generator chip which takes an input clock (or crystal oscillator) at 4x the 9900 clock frequency and provides the requisite signals — while drawing about 500 mA!

Later chips in the TMS 9900 family, such as the TMS 9980 and TMS 9995, did away with many of these electrical complexities (such as requiring only a TTL-level single-phase clock or a simple crystal, a single voltage, and a 40-pin footprint) in return for other difficulties such as multiplexed pins and an external 8-bit data bus.

The TMS 9900 gets a bad rap due to the market failure of the TI 99/4A and the relative unpopularity of its minicomputer cousins compared to the DEC, Data General, and other contemporary offerings, as well as relatively poor performance compared to other 16-bit CPUs. However, considering its relatively early release date of 1976 and the uniqueness of the architecture, it seems to be an interesting sidebar in processor development worthy of the study of a retrocomputing aficionado.

Bitsavers has documentation on the TI-990 series of minicomputers as well as the TMS 9900 and support chips. The data sheets for the 9900 family of chips are somewhat confusing and incomplete without reference to the other user manuals and TI-990 documentation.

6 Likes

Fascinating exposé. Thanks for sharing.

1 Like

If you squint it does indeed look very PDP-11-like. You also get something very similar with a TI MSP430 (you can imagine SPI interfaces as the modern CRU).

The first generations were actually bipolar (I2L - injector, injector logic) rather than MOS.

The 9900 was part of a first generation of 16 bit microprocessors which included the National Semiconductor PACE and the General Instruments CP1600 among others. There had been some multi-chip 16 bit processors before that.

A very nice computer with the TMS9900 was the 1980 Fluke 1720A. It had a touch screen and bubble memory besides floppy disks. Sold as an instrument controller, it used the IEEE 488 bus for the peripherals. With a full 16 bit main memory, it has a very fast Basic interpreter.

2 Likes

Agreed, I think the instruction set of the MSP430 resembles the 9900 in many ways. For example, autoincrement is available but autodecrement is not, and immediate is its own addressing mode because the PC is not a named register.

The TMS 9900 CRU interface

As discussed in the first post in this thread, the TMS 9900 contains an I/O interface called the Communications Register Unit, or CRU. The 9900 documentation plays up the CRU considerably as a real power feature of the platform; from a modern standpoint this is a somewhat confusing position, as CRU accesses are much slower than memory bus accesses (by a factor of as much as 20 or so!) and do not proceed asynchronously from instruction dispatch; a long CRU operation ties up the processor for the duration. (This is somewhat unavoidable due to the memory-to-memory configuration of the 9900, meaning that the address lines which are shared by the CRU prevent either instruction or register fetch while the CRU is active.) Modern sensibilities are very comfortable with memory mapped I/O, even on retro memory-constrained platforms; the C64, Apple II, BBC Micro, and most CP/M machines use almost exclusively memory-mapped I/O. The documentation does mention a real advantage of the CRU interface, which is the pin count of CRU-attached devices (for example, the 9902 UART has only 18 pins, versus 40 pins for an 8250 or 24 pins for a 6850). However, the TMS 9900 itself has 64 pins, so at some point this feels moot.

CRU communication consists of single-bit transfers on a clocked interface. The CRU controller sets an output line high or low and strobes a clock line for output, and latches from an input line when strobing the clock for an input. The instruction set on the TMS 9900 allows for CRU transfers of between one and 16 bits, but multi-bit transfers are accomplished by simply chaining a number of single-bit transfers.

In some ways the CRU interface is very similar to the more modern SPI interface, with a dedicated serial input and output line and a controller-sourced clock. However, while SPI transactions are normally in words of some multi-bit length, all CRU transactions are effectively one bit long.

Every bit on the CRU interface is assigned an address out of the 12-bit CRU address space, for a total of 4096 addressable bits. The dedicated CRU communication ICs supplied by Texas Instruments all have multi-bit interfaces; for example, the TMS 9902 UART exposes a 32-bit register as 32 consecutive bit addresses in the CRU address space. It therefore takes in a chip select line and a 5-bit address; contrast this to a SPI (or SPI-ish) chip with a 32-bit interface, which would have a single chip select line and receive a 32-bit word directly. To write multiple bits to this register, the programmer selects the “base address” of the chip, then sends (say) an 8-bit word to be transmitted. The CRU inside the CPU will automatically send the first bit of this word at the base address of the UART, the second bit at that address plus one, the third bit at that address plus two, etc., setting the processor address lines appropriately so that the UART receives each bit in the location that it expects.

At the board design level, this means that each CRU-attached device must have an address decoder similar to that required for a memory-mapped I/O device. Since the CRU device typically only accepts as many address lines as it recognizes discrete bit locations, the remainder of the address bus is decoded into a chip enable signal. Like memory-mapped I/O devices, if the CRU address space is not completely utilized, this decoding can be “sloppy”; for example, if only two CRU devices are attached, it may be convenient to discriminate between them using only the high-order CRU address bit.

There are several strange idiosyncrasies of the platform that must be taken into account when programming CRU devices. The first is a one-bit offset in the CRU and internal CPU address spaces. Because the TMS 9900 has no 1’s bit address line but the CRU can address consecutive bits, the 2’s bit memory address line is used as the 1’s bit CRU address line, and all other address lines are shifted over accordingly. Rather than handling this internally, for some reason Texas Instruments chose to make it visible to the programmer. Therefore, in order to address the CRU device at base CRU address 0x1000, the value 0x2000 must be addressed by the programmer. When placing this address on the CRU address bus, register 1’s bit will disappear, leaving the register 2’s bit as the 1’s bit of the CRU address bus, placing 0x1000 on the CRU address bus.

The second is that the CRU appears to access both processor registers and memory addresses as memory offsets, and the TMS 9900 is big endian. This becomes important because the CRU performs byte operations for 8 or fewer bits, and word operations for 9 or more bits. Since the processor is big endian, and registers reside in memory, this means that the byte address of a register is its high-order byte, not its low-order byte. Thus, if the programmer stores 0xff in R2, then transmits 8 bits of R2 through the CRU, the CRU will transmit 0x00 (as the high-order byte of 0x00ff)! However, if the programmer transferred 12 bits of R2, the CRU would send 0x0ff, as expected. This is intuitive behavior for memory addresses on a 16-bit system, but not intuitive for register accesses; most modern architectures paper over endianness for internal register accesses, and the modern programmer would expect a byte transfer to or from a named register to transfer the low-order byte. This is mitigated somewhat by the fact that the MOVB byte move instruction also loads the high-order byte of a register, presumably for the same reason — that means that a typical loop to transmit bytes over a serial port works as expected:

        LI      R1,MESG Load NUL-terminated string into R1
        SBO     16      Turn on transmitter
XMTLP   TB      22      Check for available TX buffer space
        JNE     XMTLP   Loop if buffer full
        MOVB    *R1+,R2 Fetch the next byte
        JEQ     DONE    Jump to DONE if NUL
        LDCR    R2,8    Transmit 8 bits from R2
        JMP     XMTLP   Loop until done
DONE    SBZ     16      Turn off transmitter

This snippet of TI assembler code uses the SBO (Set Bit One), SBZ (Set Bit Zero), TB (Test Bit), and LDCR (LoaD Communication Register) CRU instructions. Note that, despite the strange practice of transmitting the high byte of a register when sending 8 or fewer bits (as in the line LDCR R2,8), the MOVB and LDCR instructions balance properly so that the programmer need not worry about this unusual circumstance.

Individual multi-bit CRU transfers are about 1.5 Mbps, which seems fast, but due to the TMS 9900 architecture the minimum delay between such multi-bit transfers is large. For example, the TMS 9902 requires that the minimum time between a reset command (a single bit set) and the first configuration word (an 8 bit transfer) is eleven (3 MHz) clock cycles. Nonetheless, the two-instruction sequence of SBZ to reset followed immediately by LDCR to load the configuration meets this requirement, as it is about 30 clock cycles to set up the LDCR instruction before the first bit is transferred, if the configuration word is stored in memory. (It would be somewhat faster if it were stored in a register, but still adequate.) Streaming 16-bit words from RAM into the CRU will spend approximately as much time setting up the transfer of each word as actually transferring the word, making the effective CRU transfer rate more like 750 kbps. (These numbers are quite round, and depend heavily on the addressing modes in use, etc.).

An interesting property of this CRU interface is that it matches readily with latching 74-series logic chips such as the 74x251, 74x259, etc. Recall this example from the TMS 9900 data sheet:

The CRUIN and CRUOUT lines can be used to communicate a single bit of data from such interfaces, while CRUCLK activates the latch on output devices. In this manner each bit of the CRU address space can become a GPIO input, output, or both. (The TMS 9901 CRU interface chip can also be used to provide GPIO functionality, as well as interrupt control and timing.)

1 Like

The TMS 9000 Register File

One of the most unique aspects of the TMS 9900 is its (lack of) register file. The CPU itself has only three programmer-visible registers: the program counter, a status register, and the “workspace pointer”. The workspace pointer (WP) is a programmer-accessible register, which can be loaded with an immediate value or indirect through a register, that stores the base address of the CPU named register file. R0 is stored at the word address stored in WP, R1 is stored at the subsequent word address, etc., on through R15, for 16 words of register storage in main memory.

The advantage of storing registers in main memory is that the number of available registers is limited only by the amount of RAM in the system. The disadvantage is that every register access requires an access to main memory, which takes a minimum of two clock cycles on the TMS 9900. The advantage is reflected both in the size of the register file (sixteen named registers per workspace) and in the number of “register files” available.

Named Registers

Of the sixteen named registers, a full ten of them are truly general purpose, usable in all addressing modes in any instruction that accepts a general purpose register and unused for any other purpose. Five (R11-R15) are general purpose and can be used anywhere the other ten can be used, but are used by the processor for specific purposes in certain instructions (and may be written to, under program control, when those instructions are invoked). The final register, R0, cannot be used as an offset for register indexed operations, and must be used for specific purposes in certain instructions.

Assuming that the register file is stored at address 0500 hex, the Model 990 Computer / TMS 9900 Microprocessor Assembly Language Programmer’s Guide depicts the register file thus:

This map shows the special-purpose functions of the six registers with special status in the CPU.

  • R0: For certain instructions with an immediate shift count encoded as part of the opcode, the special value 0 indicates that the shift count is stored in the lowest 4 bits of R0.
  • R11: The branch with link operation stores the PC of the next instruction after the branch in R11 for use with the RT return instruction (which is an alias for B *R11, or branch to the address stored in R11). The XOP “extended operation” instruction stores its argument in R11.
  • R12: CRU operations (see the section on the CRU interface, above) use R12 as the base CRU address for the device being manipulated.
  • R13-R15: These registers are used to store the contents of the on-processor registers when certain operations (BLWP, XOP, interrupts. etc.) change the register workspace. This facilitates returning to the previous workspace in a manner similar to subroutine returns with a stack.

Register Properties

The TMS 9900 is big endian and byte-addressed, but only word-accessible; this means that every memory fetch (including register fetches) is for an entire word and located at an even address, and that even addresses contain the high order bits of each word. A peculiarity of the 9900 architecture is that the endianness of words stored in a register is not invisible to the programmer. In particular, byte operations operate on the address of the register (which is an even word address), not the low order bits of the register! Consider a workspace pointer of hexadecimal 0080; R0 would in this example be stored at 0080, with its high order byte at 0080 and its low order byte at 0081. Loading the hexadecimal value 32769 (hexadecimal 8001 into R0 would result in the byte 80 hex stored at 0080 and the byte 01 hex stored at 0081. However, assuming that R0 stored 0000, moving the byte 55 hex (85 decimal) into R0 with the MOVB instruction would result in 55 stored at address 0080 and the byte at 0081 unchanged, for a word value of 5500 hex (or 21760).

If this seems confusing, it is! Most modern architectures would move a byte into the low-order byte of a register regardless of the register endianness and its underlying mapping to the memory bus.

Changing Workspaces

In order to effectively use an in-memory register file, the registers must be located somewhere in RAM. Some architectures (such as the DEC PDP-11, Microchip PIC, Motorola 6800, and MOS 6502) have fixed memory locations for register-like structures. As previously mentioned, the TMS 9900 uses a workspace pointer, loadable by the programmer, to locate its registers. This workspace pointer can be loaded with any memory word address (although addresses larger than FFE0 don’t leave enough room for a full register file). In particular, it need not be 16-word aligned.

At reset, the WP register is loaded with the address stored at 0000 hex, and the program counter is loaded with the address stored at 0002 hex. Each interrupt vector entry (located at fixed addresses in memory) also provides an associated WP that is loaded on interrupt, as does each extended operation vector (more on those below). The WP can be loaded with an immediate value via the LWPI instruction. Finally, the BLWP instruction can be used to load the WP and PC registers from two consecutive memory words anywhere in the address space, and RTWP will return to the previous WP and PC values after a hardware interrupt, XOP, or BLWP.

The fundamental operation of an interrupt, BLWP, or XOP is the same, they differ only in where they find their WP and PC values. For all three, the WP and PC are loaded from two memory words and then the old values for WP, PC, and Status are saved into the new register file’s R13, R14, and R15, respectively. Interrupt vectors are located at fixed addresses (with Interrupt 0, hardware reset, at 0000 and 0002 as mentioned in the previous paragraph, and Interrupts 1-15 in the next 30 words, and the special LOAD vector at addresses FFFC and FFFE hex), as are extended operations invoked by the XOP instruction (XOP 0 is at memory location 0040 hex, and they proceed up ward through XOP 15 at 007C in a similar fashion to interrupts). The BLWP instruction, on the other hand, accepts a source location which can be encoded in any of the processor’s normal addressing modes (including stored in registers) and loads WP and PC from that location.

Only one instruction can load the workspace pointer without also branching to a new location and storing information about the old workspace in the new workspace, and that is LWPI, or Load Workspace Pointer Immediate. This allows the programmer to immediately load a new set of registers without regard to continuity with the old set of registers. This might be used, for example, to create a workspace for a new process. Note, however, that the argument to this instruction must be immediate (that is, occur in memory immediately after the LWPI instruction itself), so creating a new workspace pointer at a run time programmed location is not possible without self-modifying code.

The XOP instruction is intended to encode essentially a software interrupt. On implementations of the architecture such as the TI 990/10 that have privileged execution and memory management units, an XOP sets supervisor privilege in the same manner as a hardware interrupt. The TMS 9900 does not have execution levels or memory management, so these operations are essentially just well-known services on the 9900.

The BLWP instruction is the most general of all of the WP manipulation instructions, allowing both the WP and PC to be set from a program-defined location that can be determined at run time. It otherwise behaves like an XOP or interrupt (on the TMS 9900; it does not enter privileged execution on other implementations). The TMS 9900 documentation recommends using BLWP both for fast and easy context switches and for providing subroutines with their own register space for computation. This latter use is the most unique of the options on the TMS 9900; most of the other uses of the register workspace are not unfamiliar to modern processor users (although they are typically accomplished via stack operations, their basic behavior is familiar). The idea of providing a single subroutine not with a stack frame, but instead with a completely new set of registers, is unusual. Register window architectures (such as Sun SPARC or Intel Itanium) provide some flavor of this functionality, but even they are not quite as audacious as the TMS 9900!

Each of these WP loads (except for LWPI) leaves enough information in the high registers of the new workspace to return to the old workspace as if the processor had never left. The RTWP instruction does exactly this, and is used to return from interrupts, XOPs, or any subroutine invoked via BLWP. It restores the old WP, PC, and Status registers, and continues execution where the calling context left off. In addition, the calling context itself is readily available through these registers; for example, R13 contains the address of the calling procedure’s WP, and the calling procedure’s register set is easily indexed from this value. Likewise, the calling procedure’s program counter is stored in R14, and can be interrogated for additional information (such as simulated “immediate arguments” to a procedure) or even modified to return to a new location on RTWP.

Workspace Pointers and Reentrancy

The astute reader may have noticed a reentrancy problem with features such as XOP and hardware interrupts. While the context loaded via the vector WP may be unique from the point of view of the calling or interrupted context, it is the same for all invocations of the interrupt handler or XOP itself. This means that recursive interrupts or XOPs, or XOPs that may be used by more than one process in a multiprogrammed system, present a challenge. The rather elegantly minimal solution to this on the TMS 9900 is to inhibit interrupts for exactly one instruction following a hardware interrupt, XOP, or BLWP instruction. If the receiving context anticipates that it may be used reentrantly, it can disable interrupts (or otherwise manipulate the interrupt mask, which will be discussed more in a later post) as its first instruction, then take whatever steps are necessary to preserve its reentrancy (or simply operate with interrupts disabled). Because the status register, which is saved in R15, contains the interrupt mask of the calling context and is restored on RTWP, interrupts will be automatically re-enabled when the interrupt, XOP, or subroutine completes. (For the simple case of nested interrupts, an interrupt automatically inhibits additional interrupts of its own priority or logically lower priorities, without programmer intervention.)

3 Likes

These are great writeups for a fascinatingly odd MPU - thanks @elb!

1 Like

Honestly, this doesn’t surprise me at all.

The machine is a 16b machine, with 16b data. So sticking the byte at the high address seems natural to me. The byte operations on the registers aren’t designed for manipulating 16b values in 8bit chunks, it’s just treating the 16b register as an 8b register. So, from a programming point of view, it doesn’t really matter which byte they use. It’s good to know, for “tricks” and such, but for operations, it doesn’t matter.

Where you get in to trouble is when you mix 16b and 8b operations on the same register. That’s a recipe for disaster if you don’t know whats in the other byte. So, you know, “don’t do that”.

And it’s not a limitation for moving bytes in to memory, it’s simply how the registers are working in their dual mode.

Simply, this is an aspect to embrace rather than try to fight against.

I mostly agree; certainly if you don’t mix byte and word operations and you’re using an assembler it doesn’t matter. (I ran across this while hand-assembling, where it became more obvious.) However, there’s a catch — many instructions have no byte equivalent on this CPU. This includes all of the immediate operations and most of the bitwise operations. That means, for example, that this sequence compares as unequal:

LI      R0,>0D          Load Immediate, > is the 9900 prefix for hexadecimial
CB      R0,>0D          Compare byte

This, on the other hand, compares as equal:

LI      R0,>0D00
CB      R0,>0D

Again, not surprising once you understand what is going on and why, but certainly not like any other machine I’ve used (and I’ve used quite a few). I’m not going to claim it’s wholly unique, but for me it qualifies as surprising.

I’m not sure I understand this statement, exactly, but if you’re saying that it’s because a register is not free to be an abstract order-less quantity like it is in an architecture that doesn’t map its registers onto main memory, I agree. On the other hand, I assert that a processor designer with modern sensibilities would have simply operated on the low-order byte for direct register byte access.

The place where that modern sensibility would get you into trouble is when accessing a register workspace via a saved workspace pointer (at which point it is no longer “registers”, but rather simply 16 consecutive words). The treatment that the TMS 9900 designers gave operates the same on a register offset through a workspace pointer as it does on a direct register access.

I think it was the right design decision, but I continue to find it surprising at first glance, myself.

One advantage of putting the bytes in the high 8 bits of the registers is that you just use the normal circuits for detecting carry, overflow, negative and so on.

About the workspace pointer being unique, the Transputer was very similar. The idea was also used in the MIT Monsoon dataflow machine. The IBM 5100 had registers in memory so it could switch between banks quickly but it didn’t have an arbitrary workspace pointer.

3 Likes

Just like the 16-bit Norsk Data minicomputers (which are otherwise different in every other respect, but this was done the same way - 16 hardware interrupts which simply switched you to a different register set).

1 Like