A couple of years ago I was inspired by Ed and Hoglet’s One Page Computing (OPC) project - and so in the course of the last year I have been slowly putting together my own version which hopefully will meet the criteria of the OPC project.
Suite16 is an experimental 16-bit cpu, inspired by Steve Wozniak’s Sweet-16 virtual machine written for the 6502 and Apple II.
It uses an accumulator R0 and fifteen general purpose registers R1 to R15. The registers R12 to R15 have secondary functions such as PC, SP, IP and W (for a future Forth or other VM implementation).
This architecture was dictated by the availability of the 74F219 16 x 4-bit TTL RAM - as ultimately I wish to implement the architecture in real TTL devices.
The instruction set is split roughly into 2 groups - those that involve the accumulator and another register - which includes the arithmetic and logical instructions.
The other group are branches and calls, which affect the PC plus some miscellaneous operations that act only on the accumulator.
The instruction is 16-bits wide, with the 4-bit operation and 4-bit register operand contained in the upper byte.
The lower byte which I call the “payload” can be used to hold an 8-bit literal, an 8-bit address offset or be decoded directy to perform operations on the accumulator, such as shifts, complement, clear, set-carry, clear carry, and setting to frequently used constants etc.
Currently instruction 0Fxx has been assigned to NOP, but further decoding of xx provides a mechanism for adding additional instructions to this group. This was inspired by the “microcoded” instructions of the PDP-8.
The instruction set is still being finalised, in particular those that utilise the lower payload byte. Having written some assembly routines for numerical input and output and base conversions between decimal and hex, I notice that the lower byte is very under utilised.
This suggests that a better code density or instruction flexibility might be achievable by making more creative uses of the payload byte.
However, I don’t want to make the ISA so complex, that I will never succeed in implementing it in TTL.
I have a cpu simulation in 60 lines of C++ that runs on any Arduino compatible board. My current favourite is the 600MHz Teensy 4.x, which executes my instruction set at about 20 million simulated instructions per second.
This is the sort of performance that could be achieved in a cpu implemented in 74F series TTL.
A further aim of this project is to port the architecture to an FPGA using verilog - again a learning exercise for me.
Fortunately there is a close correlation between a cpu simulator written in C as a large switch-case statement, and the description of the cpu in verilog.
For the moment I am happy with the simulation running on the Teensy 4.x, and the next plan is to use James Bowman’s “Dazzler” module to generate a 1280 x720p 24-bit HDMI video output.
For an outlay of about $75 the combination of the Teensy 4.1 and Dazzler should provide a very capable platform for further retrocomputer explorations.
Below I reproduce the instruction set as it stands at the moment.
/* Suite-16 Instructions
Register OPS-
0n --- -- Non-Register Ops
1n SET Rn Constant (Set) Rn = @(PC+1)
2n LD Rn (Load) AC = Rn
3n ST Rn (Store) Rn = AC
4n LD @Rn (Load Indirect) AC = @Rn
5n ST @Rn (Store Indirect) @Rn = AC
6n POP @Rn Pop AC AC = @Rn Rn = Rn - 1
7n PUSH @Rn Push AC @Rn = AC Rn = Rn + 1
8n AND Rn (AND) AC = AC & Rn
9n OR Rn (OR) AC = AC | Rn
An ADD Rn (Add) AC = AC + Rn
Bn SUB Rn (Sub) AC = AC - Rn
Cn INV Rn (Invert) Rn = ~Rn
Dn DCR Rn (Decrement) Rn = Rn - 1
En INR Rn (Increment) Rn = Rn + 1
Fn XOR Rn (XOR) AC = AC ^ Rn
Non-register OPS-
00 BRA Always Target = IR7:0
01 BGT AC>0 Target = IR7:0
02 BLT AC<0 Target = IR7:0
03 BGE AC>=0 Target = IR7:0
04 BLE AC<=0 Target = IR7:0
05 BNE AC!=0 Target = IR7:0
06 BEQ AC=0 Target = IR7:0
07 JMP 16-bit Target = @(PC+1)
08 CALL 16-bit Target = @(PC+1)
09 RET Return
0A ADI Add R0 8-bit Immediate Immediate = IR7:0
0B SBI Subtract R0 8-bit Immediate Immediate = IR7:0
0C OUT putchar(AC), port = IR7:0
0D IN AC = getchar(), port = IR7:0
0E JP@ BRA (R0)
0F NOP AC &= AC IR7:0 decoded to give
"microcoded" instructions
And here is a listing of the simulator code - condensed to about 64 lines of code:
unsigned long count = 0 ;
int A = 0 ; int i=0 ;
unsigned int IR = 0 ; unsigned int PC = 0 ;
int R[16]= {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0x0200} ; // Zero the Registers, Set the top of the return stack at 0x0200
void setup() { Serial.begin(115200); }
void loop() { fetch() ; execute() ; }
void fetch() { IR = M[PC] ; PC ++ ; PC &= (MEMSIZE-1) ; }
void execute() {
/* Instruction Decoder */
unsigned int op = IR >> 12 ; // op = IR 15:12 Opcode
unsigned int n = (IR & 0xf00) >> 8 ; // n = IR 11:8 Register or Condition
int addr = IR & 0x0ff ; // addr = IR 7:0 Address or Displacement
int PCM = PC & 0xff00 ; // Modified version of PC containing page byte only, lower byte cleared
/* Opcode Execute */
switch (op) {
case 0x0: break ;
case 0x1: R[n]= M[PC] ; PC++ ; break ; /* SET */
case 0x2: R[0] = R[n] ; break ; /* LD */
case 0x3: R[n] = R[0] ; break ; /* ST */
case 0x4: R[0] = M[R[n]] ; break ; /* LD@ */
case 0x5: M[R[n]] = R[0] ; break ; /* ST@ */
case 0x6: R[0] = M[R[n]] ; R[n]= R[n]+1 ; break ; /* POP with post-increment of pointer Rn */
case 0x7: R[n]= R[n]-1 ; M[R[n]] = R[0] ; break ; /* PSH with pre-decrement of pointer Rn */
case 0x8: R[0] &= R[n] ; break ; /* AND */
case 0x9: R[0] |= R[n] ; break ; /* OR */
case 0xA: R[0] += R[n] ; break ; /* ADD */
case 0xB: R[0] -= R[n] ; break ; /* SUB */
case 0xC: R[n] = ~R[n] ; break ; /* INV */
case 0xD: R[n] = R[n]-1 ; break ; /* DEC */
case 0xE: R[n] = R[n]+1 ; break ; /* INC */
case 0xF: R[0] ^= R[n] ; break ; /* XOR */
}
/* Conditional Branches and I/O Group */
A = R[0] ;
if (op == 0) { // do an unconditional jump back to enclosed address
switch (n) {
case 0x0: PC = PCM + addr ; break ; // BRA Branch Always
case 0x1: if(A>0) { PC = PCM + addr ; } break ; // BGT Branch if Greater
case 0x2: if(A<0) { PC = PCM + addr ; } break ; // BLT Branch if Less Than
case 0x3: if(A>=0) { PC = PCM + addr ; } break ; // BGE Branch if Greater or Equal
case 0x4: if(A<=0) { PC = PCM + addr ; } break ; // BLE Branch if Less Than or Equal
case 0x5: if(A!=0) { PC = PCM + addr ; } break ; // BNE Branch if Not Equal to zero
case 0x6: if(A==0) { PC = PCM + addr ; } break ; // BEQ Branch if Equal to zero
case 0x7: PC++ ; addr = M[PC] ; PC = addr ; break ; // 16-bit JMP
case 0x8: R[15]= R[15]-1 ; M[R[15]] = PC ; PC = addr ; break ; // CALL (zero page) use R15 as Return Stack Pointer
case 0x9: PC = M[R[15]] ; R[15]= R[15]+1 ; break ; // RET
case 0xA: R[0]= R[0] + addr ; break ; // ADI add the immediate 8-bit contained in the address field
case 0xB: R[0]= R[0] - addr ; break ; // SBI subtract the immediate 8-bit contained in the address field
case 0xC: Serial.write(R[0]) ; break ; // OUT - output a character to the Serial port
case 0xD: i =0 ; while (i < (63)) { // GETCHAR - get '0' terminated character string into buffer
while (!Serial.available());
char ch = Serial.read();
if (ch == '\r' || ch == '\n') break;
if (ch >= ' ' && ch <= '~') { M[512 + i] = ch; i++; }
}
M[512+i] = '\n' ; M[512+1+i] = 0 ; // Terminate the buffer with zero
break ;
case 0xE: PC = R[0] ; R[0]= M[PC] ; break ; // JMP @R0 - useful for indexing and table look-up
( curious but useful pipeline effect here)
case 0xF: break ; R[0] &= R[0] ; // NOP AND accumulator with itself
}
}
}