Stack machines vs. RISC, a modern comparison?

oh… sorry… some kind of memory failure there on my part.

This posting is somewhat outside the realm of a group dedicated to retrocomputing, but it is part of the answer to the question the OP originally posed.

I think Linus is right, and I think there’s a general lack of appreciation (and I don’t mean that as a comment about this group in particular!) for how much compiler technology has changed in the last 30 years. If you learned your compiler tech from the “Dragon Book” (Principles of Compiler Design) back in the 80s or 90s, as I did, you’ll likely be surprised if you try engaging with a modern compiler.

The changes are all in the “middle end” and the “back end” tech. Modern compilers reduce the source program to a completely abstract, register-oriented dataflow format, usually SSA (static single assignment). They then do between 60 and 120 separate optimization passes over the SSA and then pass this completely abstracted, optimized representation to a sophisticated register allocator.

The allocator really can use the many registers provided by a modern CPUs (but not stack machines). It’s all in pursuit of working like a dataflow engine: work in registers, consuming instructions until either all the functional units are in use or no more instructions can be found that don’t depend on results not yet computed.

Not all the back ends are as monstrous as LLVM, but the leading ones all use these techniques. There’s one called Cranelift, and the standard Golang compiler is also SSA-based and of only modest complexity by modern standards (it makes about 60 distinct passes over the SSA before committing to code generation).

We still build stack machines (mostly in software), do simple bytecode-type stack oriented code generation, template based code generation, etc. But when the performance of generated code matters, it’s impossible to ignore the gains that come from register machines with modern SSA-based code generators to drive them.

4 Likes

Nice insights! I can’t help referring us to IBM’s 63-pass Fortran compiler although that was many-pass for different reasons.

A post was merged into an existing topic: IBM’s 63-pass Fortran compiler

There were some really successful stack processors, e.g. Burroughs 5000’ mm co Pure stack machines can only access the top of stack. That’s horrible for performance.
Others had multiple stacks (data, return addresses)
Others are a bit smarter and let you directly access further down the stack.
Yet others treated the top-of-stack as a register file (Sparc register windows, the CRISP processor which basically had memory-to-memory ops that cached the TOS in its registers.

Probably the most unusual is the Mill, which has the belt (as in “conveyor”); something like a TOS, but it’s fixed length, and unused data can fall off the end, Or the entire belt can be permed in a single instruction. And can be a very wide WLIW- like machine as well. And has security baked in. Currently in RTL, but an FPGA version is planned.

And if you want to get involve check out millcomputing.com.

2 Likes

It is rather hard to compare stack machines and RISC since the seem to solve different problems
in the field and time era’s. Stack machines are best with Algol style program structure, that requires modest amounts of memory but more computing power. RISC and C have a different mindset of being fast at the cost of memory. Like the saying goes, Quality/Cost. Speed. Size. Choose any two of the three,

The mill is interesting, but appears to be moribund.

One thing caught my eye though - they claim that half the energy used in a modern CPU is related to register mapping and use. That seems… high?

Anyone have any good metrics on that?

You could easily write a modern Algol computer that is optimized for register use not stacks. Algol compilers back in the day were stack oriented, as were all/most high level languages, due to computer architecture and memory constraints at that time. You were luck to have 16k of RAM to compile with back then?

I WAS LUCKY TO HAVE 16KB ON A S100 COMPUTER!
I am downloading several older computer science books from the 60’s just to see what was
done. https://annas-archive.org/ I want to use a high level language to bootstrap my system
but very few languages are designed that way. Even BCPL is too complex simply because I have
16 vs 18 bit word size issue.
Ben. PS BCPL can also be considered a stack machine.

1 Like

The Mill has just a handful of people actively working on it - but they are actively working - you just don’t see it unless you’re in the meetings or monitoring the website. They have simulations that run benchmarks, and they’re claiming pretty good performance numbers. I’m not sure they could have run the benchmarks a year ago.
I think the register reference is to superscalar out-of-order design that needs pretty massive multiported array, and extensive decoding and slection logic. Instruction decode isn’t trivial either. I have no clue if half the energy is going there (or whether it is static or dynamic activity related))

1 Like

The Mill argument is how much power DSP chips (VLIW for high end ones) use compared to fancy out-of-order processors for the same performance. But DSPs only handle some applications, so if the Mill could do the same for all aplications it would be an important breakthrough.

In the 1990s I had some interesting email discussions with my friend Kyle Hayes about high performance stack machines. He pointed out to me FIFO machines as the dual for LIFO (stack) machines so I looked into it but didn’t get very far. The big improvement Mill adds to the FIFO (what they call the “belt”) it that they can retrieve any number of elements from the front and not just one. That makes it feel more like a Dataflow architecture.

A cute Forth processor I once saw was a modified MIPS (though I think it had 8 instead of 32 registers) where two of its registers were actually stacks. For example: R1 would really be the return stack and R7 the data stack. This allows you to mix the two programming styles.

2 Likes

The belt can do far more than that. You can arbitrarily permute belt elements in a single cycle - that’s equivalent to an conventional out-or-order GPR machine reloading all the renaming entries at once (GPR machines can do in some special cases i think, but only one at a time - and not explicitly.) This has big implications on subroutine entry/exit code, register allocation, and a host of other advantages. In theory, a GPR machine could implement an instruction like that (though you would need a 160bit immediate) but no one has ever done it as far as i know.

RISC started life mostly from how SLOW the VAX was for some instructions.
The VAX was mostly created to run better C code. All this is about optimizing C.
What about Algol type languages getting the same treatment as C gets, for
improvement in native code generation.What about threaded code, LISP style?
Bounds checking and dynamic arrays? (RISC improvements, not stack machine improvements)

There have been a lot of machines specialized for some language or another,
The Burroughs B5500/6500 etc were Algol machines.
The Symbolics and LispMachines were LISP machines,
as was the SPUR processor from from Berkeley
Symbol was VERY specialized for its own SPL language
ARM had an extension for Java, (Jazelle)
There have been a pile of Forth based microprocessors
IBM had an APL desktop (and HP had an APL microcoded+optimizing JIT compiler product.

Oddly: wide commercial success seems to have eluded all of them (though Burroughs certainly had its day. All were great at …some particular thing, but when measured across all apps, general purpose architectures, better compiler technology, and using transistors for superscalar out-of-order (and better marketing) won out.

1 Like

I can’t say RISC has won, Intel still sells the XX86 of the month.
For most use a 15 ¢ CPU still runs your toaster or microwave, regardless how they
talk about the Internet connecting to everything.
The market place has a wide range of products, enough for everybody.

I don’t think this was quite true, at the time. When the VAX was being developed (1976-1977), the Unix available to the public was Sixth Edition, and it was installed mostly at academic institutions. DEC was selling a whole variety of operating systems on PDP-11s (RT-11, RSTS, RSX-11, and several others), most of which were written in assembly and which provided a large variety of user-space languages (assembly, Fortran, Focal, BLISS, BASIC, …). It was selling a smaller set of operating systems (TOPS-10, TOPS-20) on the PDP-10 with an even larger variety of languages in use (add LISP, for example), although I’m not sure how many were actually provided by DEC. Unix and C weren’t even portable yet, as the development of the VAX would have been in parallel with both the Bell Labs and Wollongong ports of Unix to 32-bit Interdata machines.

I think it could be argued that the VAX was the last major, industry-dominant, processor that was not designed for C! It just so happens that it was originally envisioned as an extension of the PDP-11 into 32-bits with virtual memory … and C was designed for the PDP-11.

RISC, on the other hand, was designed for C, or at least for “modern” (of the time) optimizing compilers, rather than handwritten assembly. The VAX was a platform that still expected programmers, including people who were not professional programmers, but scientists or business analysts or what-have-you, to write assembly to solve their problems.

2 Likes

Could a RISC design have been created instead of the VAX, or was hardware still too limited
with the IC’s of the time?

Another example being Inmos’ Transputer, designed with and for Occam, a language which embodies Tony Hoare’s CSP paradigm. It met just a little more success when it acquired a C compiler. (Its peak commercial success, I believe, was as ST20 in system-on-chip form in set-top boxes.)

Indeed - and this is the realisation, I think: that computers are useful in as much as they run software, and being able to create and debug the software - having the labour pool to draw on, and the tooling - is crucial. Software needs to run adequately well, and be adequately cheap to produce.

One of the beliefs of the more forward-looking managers within Inmos was that we were too technology-led, and to survive we needed to be market-led. (I heard this, but was and have always been rather a fan of technology. I was not in a decision-making position.)