Acorn, 1995, product plans and customer conversations

There’s a long Usenet thread which has a few posts of note. It starts off with Chris Cox, an Acorn marketing person asking, in April 1995:

To prepare for the release of the 700 upgrade cards during the second quarter (ish) I want to make sure that we get the right number and mixture. So I am looking for some indication of how many existing users feel that they will upgrade to a new card, and whether it will be to the 710 integer only card, or the higher cost 700 card with the FPA (Floating Point Accelerator)

Here’s flibble’s photo of the eventual product, CC-BY-SA

It seems to me like a poor idea to ask a question like this on Usenet - you won’t get representative results and you’ll probably get some unhelpful commentary and awkward questions. But that’s just my opinion. Oh, and the Osborne Effect dates from 1991.

The idea here is that the integer only offering can be clocked faster than the with-floating-point-capability offering.

One would ideally be acquainted with Acorn’s previous and current 32-bit offerings, of course. There’s a nascent wiki for that:
ARM700 - Arc Wiki

And perhaps see also these two threads on Stardot for an overview:
Benchmarks
32-bit benchmarks
(Note that ARM is and was a 32 bit CPU, but early ARM had a 26 bit program counter, so that first thread is for the Acorn machines of that kind, sometimes called 26 bit machines, and mostly called Archimedes.)

The later machines were mostly called RISC PC, and in 1996 DEC’s StrongARM implementation was used by Acorn to bring a big step up in performance (202MHz). But this is 1995 and that’s the future - albeit the forseeable future. We’re still in the 30 to 40MHz era for Acorn, the P75 era for Intel.

Chris later says

There has been a lot of discussion about the clock speeds and performance of the 700 cards. I hope that the following will help clarify some of the issues involved.

The ARM700 and ARM710 processors represent a significant improvement over the current ARM610 processors. They have a higher maximum clock speed and a number of architectural improvements such as double the size of internal cache, this means that more of any process can be executed internally without accessing the (relatively) slow external memory. Other improvements are an improved write buffer and an enlarged Translation Lookaside Buffer in the MMU. All of these improvements increase the performance of the system and deliver more real performance than a simple comparison of clock speeds would indicate.

For floating point work the FPA11 processor has a peak throughput of up to 5 MFLOPS (Millions of FLoating point Operations Per Second) and achieves an average throughput in excess of 3 MFLOPS for a range of calculations. Applications which make use of the RISC OS Floating Point Emulator will automatically use the FPA chip, increasing floating point performance by as much as a factor of 50. This will make a huge difference in the performance of floating point intensive tasks such as ray tracing.

The FPA11 is interfaced to the ARM700 over a simple, high-performance co-processor bus. The ARM instruction pipeline is mirrored in FPA11 so that floating point instructions can be executed directly with minimum communication overhead. Pipelining, concurrent execution units and speculative execution are all used to improve performance without significantly impacting power consumption. The ARM700 processor, which has the external coprocessor bus interface, is currently fabricated on a process that limits the maximum clock speed to 33MHz.

The ARM710 has been optimised for integer performance, and of the samples that we have evaluated the fastest ran at 55MHz However there is a big difference between a prototype based on a sample and a production board. When we specify the production boards we have to make sure that they will work under all conditions, and for all devices. This means that we have to derate the clock speed to allow for a number of different factors:

Fastest ever ARM710 sample seen running in ideal conditions 55MHz

Allow for power drop to 4.75V eg when HDD spins up 0.97

Allow for a max chip temp of 70 deg C 0.9
(30deg ambient +15deg inside case +25deg inside chip)

Process spread from the fastest part 0.825

Multiplying these together gives the specification of 40MHz

These are the figures that we use, based on our experience of a number of different fabricators. As you can see from the above figures it is perfectly possible that with the right crystal, a particular board in a particular system might run at 50MHz to 55MHz, but this cannot be guaranteed.

Here are some picks from downthread - I’m looking for informative rather than controversial…

Mark Smith says

Actually the 33MHz ARM 700 based RPC will perform significantly better than
the 33MHz A5000 for the following reasons:

  • The ARM 700 cache is twice the size of the ARM 3 cache (8K vs 4K)
  • The ARM 700 has an “enhanced” write buffer (enhanced compared to the
    600/610 - I don’t know the difference off-hand, but I imagine that it is
    larger), whereas the ARM 3 has none.
  • Most Risc PCs have VRAM (presumably anyone considering an ARM700 upgrade
    will already have at least 1MByte of VRAM). This factor has more effect when
    using higher resolution displays, but it will always have some effect.
  • The Risc PC has a faster bus.

Vincent Lefevre says

Quite amazing! Tests in Basic have been done by a French magazine
(SVM, October 94, pp 186-187):

FP test:
RiscPC 600: 17s.
PC 486DX2/66: 9s.

For information, I give you the results of the other two tests:

Integer test:
RPC: 9.1s
PC: 12s

Circle, ellipse display:
RPC: 6.1s
PC: 17.1s

Hans Marks replies

Some results from my own FP-test:

System CPU OS Language Result
RiscPC 600 ARM610 (30) Risc-OS 3.50 ARM C 4.00 241s
IBM 80486DX2 (66) MSWINDOWS 3.1 Borland C++ 3.1 18s
Acorn A540 ARM3, FPA10 (25) Risc-OS 3.10 ARM C 4.00 8s

Hans later adds

comparing an ARM610 with an ARM3 + FPA10 is not quite fair. How about this:

System CPU OS Language Result
Acorn A540 ARM3 (26) Risc-OS 3.10 ARM C 4.00 264
RiscPC 600 ARM610 (30) Risc-OS 3.50 ARM C 4.00 241
Archimedes 340 ARM3 (36) Risc-OS 3.10 ARM C 4.00 229

Anyway, let’s hope that an ARM700 + FPA11 will perform a bit better

Simon Burrows notes

Peter Bondar made a theatre presentation at the Harrogate Show and claimed that both ARM810 and StrongARM upgrades will be available in 1996.
He seemed very confident about this!

(thread noted in the stardot discord)

4 Likes

As someone points out in the Usenet thread, the ARM7 cards were a stopgap measure, with StrongARM on the horizon and ARM8 still believed to be relevant. Acorn had to provide an upgrade to keep customers interested and, by this time, maybe existing customers were going to be more important than potential customers.

I don’t think users knew at the time quite how much trouble Acorn was in. These two variants of the same talk by Chris Cox cover the same period:
https://rougol.jellybaby.net/meetings/2019/feb.html

I was still an undergraduate at the time, so all these upgrades were out of my league. I felt that the RiscPC itself was a stopgap, but was interested in buying one with floating point capabilities if a laptop wasn’t on the cards.
By the time I had the resources to buy a machine of my own the A7000+ had been released, so I got one of those instead, though I did briefly consider getting a Mac clone.

1 Like

I happened to see a blog post about the comparison chart in wikipedia’s archimedes page - it’s an interesting chart comparing Archimedes, Amiga, Mac, and PC, in terms of VAX MIPS:

2 Likes

Why is the 586 so far head in terms of speed? 2x faster.
Are we still comparing apples to apples, or apples to oranges here?

Money? R&B expenditure? Beliefs? Market pressures?

By 1990 the world had gone Intel. Intel were pouring millions into their CPUs and surrounding technology. The 68K had more or less peaked then too and 1993 gave us Doom… One other company isn’t mentioned there - Sun - They went from 68K to their own CPU - Sparc - another RISC system but Intel had already won the desktop at that point - mostly in terms of cost…

One thing I remember was when Linux came out and shortly after I bought a PC - 486DX2 with 32MB of RAM - it ran Linux and X-Windows and it ran it faster (much much faster) than the Sparcstation I had on my desk at the time - so that was the end of Spark for us …

In the 1989 Intel gave us the i860 - a fantastic RISC CPU with mind-bogglingly hard FPU coding, but when coded (more often by hand) it could fly - that technology influenced the MMX instructions in the Pentium and it just took off from there - I do not think ARM had the manpower or money to keep up then, so was always going to be the poor man for absolute computer performance.

What’s not represented there is power consumption. ARM would barely be a pixel on that graph if we looked at the energy usage per MIP …

-Gordon

It is important to note that the Archimedes and Amiga were in one price range while the Compaqs and Macs were in a different market.

The 1987 numbers are what I would have liked to have shown to all those people that until recently liked to claim the x86 was a high performance architecture and the ARM wasn’t. In fact, extending this graph to the StrongARM RiscPC years we would see the ARM once again jumping ahead of the x86 in terms of absolute performance before being once again left behind (due to Intel buying the StrongARM from DEC and freezing its development before selling it to Marvell).

1 Like

I’m quite skeptical about this – or, let’s say, surprised –, as well. May it be that this is more like raw instruction execution, leaving memory access and any pipeline issues out of the picture? (These were hot topics between the Intel and the Motorola camps, back then.)

I would say that one factor working in favour of the x86 machines was the clock speed. As an example, the article mentions that the Compaq Deskpro 386 started at 16MHz, then went to 20MHz and 33MHz. The Archimedes A310 during that era was running an ARM2 at 8MHz.

The Pentium had two pipelines (one which could only execute simple instructions and one that could execute any instruction) so could do a peak of 2 instructions per clock. It had larger internal caches than its competitors and a 64 bit data bus to external memory.

About clocks, the ARM2 at 8MHz could do one memory access per clock cycle in the case of sequential bursts while its competitors used several clock cycles. Only at around 33MHz did the 386 or 68030 outperform the 8MHz ARM2. The ARM3 was just an ARM2 with an internal 4KB cache (shared between data and instructions) which allowed it to jump to 25MHz.

2 Likes

The Pentium was also the time when instruction cache and branch prediction started being something that actually mattered.