The PDP-8/S was a close cousin of the original PDP-8, but was designed as an exercise in cost reduction - so that a PDP-8 compatible machine could be sold for under $10,000.
In order to achieve this cost reduction, it meant a considerable reduction in the transistor count from about 1500 to 1000 transistors and this required the machine to have a bit serial architecture.
Bit serial was a design technique that was utilised just after WWII notably with the EDSAC. As the memory of the EDSAC was based around ultrasonic delay lines, where the bits were stored as a series of ultrasonic pulses contained within a mercury filled tube, it made absolute sense to make the EDSAC ALU process this data one bit at a time.
The bit serial approach works well for logical operations, because each bit of the operands, can be operated on individually.
The arithmetic operations of addition and multiply, can also be readily broken down into bit serial processes, provided that any carry from each bitwise addition is retained in a flip-flop and re-introduced to the carry input of the full adder for the next bit addition.
The bit serial approach allowed the EDSAC to be constructed using about 3000 vacuum tubes, and provided the only realistic means, at that time, to allow for a long wordsize of 71 bits in the accumulator.
The downside of the bit-serial approach was that the EDSAC machine was slow, returning only about 600 operations per second.
Some 20 years after ESDAC, the PDP-8/S was developed to reduce transistor count, at the expense of execution speed. The PDP-8/S had a 1.33MHz clock, but it took from 48 to 104 clock cycles (36uS to 78uS) to complete an instruction.
As a result, the PDP-8/S was around 20 to 30 times slower that the original PDP-8. However it could run most PDP-8 software including the 4K FOCAL language.
As a result of the bit serial ALU and shift registers used for the principle registers, the PDP-8/S was reported to be implemented in just 519 logic gates.
The reduction in transistor count in the PDP-8/S meant a large reduction in the number of “flip-chip” modules required to implement its logic. Consequently, the PDP-8/S could be housed in a 19" desktop or rack mounted case, rather than the refrigrator sized enclosure of the original PDP-8. The PDP-8/S sold for less than $10,000, compared to $18,500 for the original “Straight-Eight”.
The PDP-8/S was not a big-seller, with between 1000 and 1500 machines sold. It was discontinued after 4 years, because the introduction of TTL used in the PDP-8/I and the PDP-8/L allowed a full spec, sub-$10,000 price to be achieved using the lower cost integrated circuit technology.
Is Bit Serial Still Relevant Today?
The bit serial architecture still makes absolute sense if you want to minimise hardware resources and complexity. One modern example is a 32-bit RISC-V RV-32 core, called SERV implemented using minimum resources as a soft-core (or multi-cores) on an FPGA. A link to a descriptive video is below:
With modern hardware, including FPGA and “Little Logic” TTL gates and flip-flops, we have devices that can be clocked at 200MHz plus. Even if we have to divide this clock by 64, we could still have a 16-bit cpu that could execute instructions at 4MHz or more - equivalent or better to the original IBM PC.
A bit serial cpu makes perfect sense, when so many modern peripheral devices now use serial protocols, especially SPI, I2C and UART interfaces. Bit serial is the principal method for generating monochrome video.
Some years ago, when I first studied the Nand to Tetris (N2T) course, I decided to think about the “Hack” cpu as a bitslice design. Hack is somewhat similar in its accumulator architecture to the PDP-8, albeit 16-bit rather than 12-bit.
The cpu could be implemented as 16 identical pcbs, each of which was constructed from 20, 74xx00, quad dual input NAND gate packages. As 16 of these cards would be needed to create the cpu, we were looking at a total gate count of some 1280 nand gates.
However, by using a bit serial approach, initial investigations suggest that the (nand) gate count could be reduced to approximately 300 gates.
Analysing the logic used in my bitslice, it became apparent that the 2 input XOR, the 2 input mux and the D-type flip-flop accounted for some 90% of the logic.
As you need 4 NANDs to make an XOR, 4 NANDs to make a 2-input mux, and as many as 9 NANDs to make a D-type flip-flop, it makes sense to factor these NANDs out for higher level gates. Alternatively you could use a multi-function or configurable gate.
If one was to implement these using the “Little Logic” multi-function, configurable gates such as the 74LVC1G99, (logic configuration in table below) the bitslice could be implemented in about 12 of these devices and 3 of the 74LVC1G74 D-type flip flops. The whole 16-bit Hack design is reduced to 240 “little logic” packages plus a ROM and RAM.
Taking the bit-serial approach, we still need the 12 off 1G99 configurable gates, and we implement the Accumulator, the Memory Buffer and Program Counter register using pairs of 8-bit sfift registers - such as 74HC299, 74HC165 and 74HC595. Add to these about 6 devices for a control unit, ROM and RAM and we have a complete 16-bit TTL cpu into fewer than 30 packages.
Further optimisation might well be possible by reducing the configurable gates into fewer more specialised packages.
The original “Hack” bitslice implemented in 74xx00 nand gates can be viewed on my Github repository here: