A bit of progress on our Beeb816 project

My experience with the 65816 is all from data sheets, I’ve never spent any real time programming it, so please correct me if I am mistaken. However, isn’t this more a package limitation than an architectural limitation? The '816 instruction set has a wide variety of two-byte memory instructions; given a 16-bit data bus, it could do these in a single fetch for aligned access, could it not? That wouldn’t be dissimilar to the 8088 → 80286 upgrade path in the PC world. Even in the 40-pin package, it already had multiplexed pins, perhaps it could have even been taken to 16 bits in a 40-pin package. (I’m not intimately familiar with its multiplexing, though.)

I think it would be quite difficult to give the '816 a 16 bit bus - I think it would be a rather different implementation on the inside. I think the 816 was possible, by WDC’s very limited team, in large part because it works rather like a 6502, but more so.

Every now and again on 6502.org someone wonders about this sort of thing, but no-one has yet done it, even though there are some 20 known reimplementations. It is tempting, even for the 6502 instruction set, but I think it’s intrinsically rather difficult. It needs a decoupled load/store unit, I think, or at least a decoupled memory interface. Those sorts of things are typically seen in later, more complex CPUs.

1 Like

I agree, and I think the typical lumping of “x86” oversimplifies that history; certainly the 80386 as a PC platform got a huge leg up through backward compatibility, but it’s difficult to overstate the sea change in the personal computer market of a true 32-bit architecture with a built-in paging MMU. 386 protected mode isn’t even in the same ballpark as the 8088/86/186/286 that came before. Like AMD64, it’s almost a new architecture with an incidental backward compatibility mode.

This is something that came pretty late to competing architectures; the 68k didn’t have a built-in MMU until the 68030 a couple of years later, and the external MMU units for the 020 and earlier all had significant compromises. I think the relative simplicity and garbage-ness (to use a technical term that @EdS may object to) of MS-DOS also played a role here; it was “easy” to start using that MMU, even for individual DOS applications, on the 386, and to switch in and out of protected-versus-not modes to access BIOS and DOS services. Video games and application software alike did it, and boom, they were freed from overlays and segments and all of the cruddiness of a 16 bit architecture that somehow in the meantime sprouted double-digit megabytes of RAM.

I’ll admit I really don’t know whether interrupt response was significant on SNES or Mega Drive. But generally most software on those consoles relied more upon the graphics chips to pull their weight than the CPU. The CPU would only really get a workout on certain games, like some computer game ports. Another World/Out of this World was a particularly interesting example of CPU heavy software rendering, if you look into the details of how its console ports worked.

1 Like

About MMUs and similar, aren’t we seeing here a sort of recapitulation of earlier CPU developments, as transistor budgets (and budgets) got bigger? The ARM1 lacked a cache because there was no time and no budget: the ARM3 added it (IIRC). The 6800 has no multiplier, but the 6809 adds it. It’s not so much about what can be done, as what’s economic at the time.

1 Like

Yeah, the dragon that I see lurking there is that programmers could feel free to treat it is a “better 6502”, which means rampant unaligned accesses and similar, entailing a memory interface for a 16-bit bus that can handle decomposing and reassembling 16-bit accesses across alignment boundaries, etc. However, several other architectures of the time handled that relatively gracefully, with only the unavoidable attendant performance penalties; granted, they did it with more transistors than the '816!

I agree that it seems likely that the size of WDC’s '816 team played into this; it is notable that even 40 years later there is essentially only one implementation of the 65816 from WDC, only ever available in two packages. Is that the chicken or the egg, though?

For better or worse, WDC seem fixated on the 8 bit bus, probably because that leads to low system costs. Their 32 bit idea was also to use an 8 bit bus. But as their major market is the toy market, low cost is probably right.

@EdS I, guess 68K interrupt response is over my pay grade. :slight_smile: Meaning, I never did much on the 68K platform besides using it, so this was rather new to me. – Thanks for providing yet another opportunity of learning something.

The 80386 was a success because of its backward compatibility, period. Yes, it introduced vastly superior and vastly more elegant capabilities, but it wouldn’t have gotten anywhere without its backwards compatibility. Not just software, but also hardware.

The 68000’s elegant design was already attractive back when IBM was designing the IBM PC. But it lacked compatibility with existing I/O support chips. IBM wasn’t so worried about software backwards compatibility, but the lack of hardware compatibility caused them to pass over the 68000. And as a result, x86 won and 68K would end up an also-ran.

If Intel had tried to introduce a new 32 bit CPU without any backwards compatibility? It would have been a humiliating dud. By the time they released the 80386, 68K was well established and had a suitable hardware ecosystem of support chips. This hypothetical Intel alternative without backwards compatibility would be trying to go up against that from scratch … a very hard sell, considering IBM was happy to continue with the 80286 (which they had a license to produce), and considering workstation makers were doing well with 68K.

No, this hypothetical Intel 32 bit CPU would have been a failure, and Intel (or possibly others) would have to develop a backwards compatible 32 bit CPU to kick start the 32 bit PC clone revolution.

Regarding IBM, the PC, and compatibility concerns, mind that the 8086/88 was little-endian, while IBM systems had been mostly big-endian (S/360, S/370, etc, as was the PDP-10 and “coincidentally” the 68000). In the light of this, IBM’s decision to go with Intel’s chip may have been even surprising, since IBM’s idea for the PC was that it would be still heavily dependent on a mainframe back office (of IBM machines, of course), when in business use.
As I understand it, Motorola wasn’t able to produce production samples in required numbers and time and was out of the race, quite early on, because of this.

The usual story is that they were specifically not able to produce the 68008, which had an 8-bit external data bus, in time. IBM wanted an 8-bit external interface, for the external support compatibility reasons that @IsaacKuo mentions above.

It’s obviously only one side of the story, but as I recall this oral history of the Motorola 68000 team from the Computer History Museum discusses the flirtation with IBM when the PC was seeking a processor. (I watched that video several years ago, but as I recall, it was quite interesting. It addresses some other frequent points of discussion, such as the relationship of the 68k to the 6800/6809 and MOS 6502, as well, I think.)

3 Likes

Ah, the CHM’s oral histories are an excellent resource - for all that people don’t always answer the questions we’d like them to answer! (And that they tend to tell a story which paints them in a good light.) Thanks for the video link - the transcript is also available as a PDF.

Intel’s story, I think, is a series of lurches forwards, trying (and always failing) to make a clean start, and in parallel adding on this or that cleaner layer on top of the lucrative backward-compatible business they had. (See quote below.) As with all microprocessors, it’s a story of what you can do with the number of transistors you can pack onto a die, and how you can get people to buy into your offering.

Here’s Intel 386 Microprocessor Design and Development
Oral History Panel
as a PDF, and here’s John Crawford’s Oral History. He says:

So a polite way to say what the role of the 386 was to be the future of the compatible 8086, 286 processors. Maybe a more-- a jaundiced view is well it was a stop gap in a long line of successful stop gaps at Intel for I think, in a sense, the 8086 was a stop gap and the 286 was something that was there to continue the market while the 432, you know, a very ambitious program was being developed. So the idea with the 386 was to build on the compatible front not to be as ambitious as this new project, which was going to be a 432 follow-on. But to serve, you know-- but to serve some part of the market then that would value the compatibility and be there kind of to hold the fort, if you will, while this other project came on.

(For me, the panels are less successful than the individual oral histories, for the usual reasons: people need space, and people take space.)

3 Likes

But the 8086 was quick and dirty cpu, to cover up up the delay with the iAPX 432 design
by intel. There is a lot of press about how great 8x86 is but very little on this “white elephant”
by intel.I suspect this design was the idea for segment registers with 8088.

2 Likes

Ouch:

Using the semiconductor technology of its day, Intel’s engineers weren’t able to translate the design into a very efficient first implementation. Along with the lack of optimization in a premature Ada compiler, this contributed to rather slow but expensive computer systems, performing typical benchmarks at roughly 1/4 the speed of the new 80286 chip at the same clock frequency (in early 1982)

(Wikipedia)

Let’s have a closer look at this: the IBM AT, running an Intel 80286 @ 6 MHz, made 1.3 MIPS. The iAPX 432 was available at 4 MHz, 5 MHz, 7 MHz and 8 MHz. Meaning, any classic home computer (typically at about 0.4–0.45 MIPS) would have been running circles around a 4 MHz or 5 MHz base model.
However, the same page also states,

“with a peak performance of 2 million instructions per second at 8 MHz”*,

which wouldn’t be that bad, as 2 MIPS is comparable to a 80386 DX @ 16 MHz (as in the IBM PS/2 70) or just below a Motorola 86020 at about the same clock speed (2.5 MIPS with a 68020 @ 16 MHz as in the Mac II). Or, in other words, vastly superior to a 80286. – I guess, there’s some conflicting information in that article.

(MIPS data from https://gamicus.fandom.com/wiki/Instructions_per_second.)

1 Like

Interesting. I was completely unaware of the existence of this “semItanic”.

Me too. Sounds like an intiguing concept that may be worth reevaluating with todays technology. (However, with optimizing compiler backends, like clang, direct support of higher languages is probably a thing of the past.)

Early benchmarks made the iAPX432 seem even worse than it eventually way. The microcode in the first versions of the chip was not very optimized and the Ada compiler initially used exclusively the most expensive call instruction of the various available.

One thing that significantly limited the performance of the iAPX432 was its packet bus, though that was also what allowed very powerful (multiprocessors with two levels of redundancy) systems to be built. There were only 16 lines, so in the first clock you sent the command (read, write, interrupt, etc), the size of the block to be transferred and part of the address. In the second clock cycle you sent the rest of the address. The next cycles transferred data 16 bits at a time. I imagine that if you built a 386 with such a bus it would not be very fast either.

Though the iAPX432 was quickly removed from the market and eventually erased from Intel history, that was not the end of the project. Instead they decided to use the best features of this architecture in a RISC processor for Siemens.

This eventually was sold as the Intel 960 but when the company decided to bet its future on the x86 the 960 was stripped down and repositioned as a “smart DMA” for high end i/o cards, just like the Intel 860 was demoted to being a graphics coprocessor.

3 Likes

You never fail to impress me with your knowlegde on those things!

MIPS make good marketing as they only have a single memory access, the Instruction Pointer.
Any operations are internal to the CPU. Move, incriment, shift, ect.
Real Operations need refer to memory, and thus are slower. How about comparing them instead.
ADA I have not used, but I expect like ALGOL memory operations can get very complex. All this needs to bounds checked as well. Software needs to written well for good speed so don’t blame
the compiler or hardware for all cases.
So what if Calls are slow, calling procedures are slow with any machine.

1 Like

A company I worked for made supercomputer boards base on the Intel i860.

They were a really nice RISC CPU but somewhat challenging when you wanted to do floating point. In particular taking an interrupt when you were hand-cranking the floating point pipeline and re-starting it was somewhat tricky and consumed many cycles, however with care you could lever it’s claimed 3 instructions per cycle - mostly. (a 64-bit data bus into a 32-bit CPU, so one integer instruction and one FP instruction running in parallel, and if that FP instruction was multiply and add, then 3 operations per cycle…)

Intel made a supercomputer based on them too but after that a lot of the graphics type instructions were merged into MMX, etc. instructions for the next gen x86 processor.

-Gordon

1 Like