Retro Benchmarking - Basic vs. BCPL (sort of)

drogon · March 24, 2023, 6:48pm

For no reasons other than personal gain (pain?) I decided to run some benchmarks on my somewhat retro Ruby816 SBC computer. It’s a 65C816 CPU (the so-called 16-bit 6502) running in 8-bit mode with various BASICs and in 16-bit mode running a 32-bit VM to support BCPL. It runs at 16Mhz and has 512KB of RAM (only 64KB accessible in 8-bit mode though)

I was mostly interested in how much faster BCPL was. The result was that it is faster, but not always (bother).

I put together a blog post on my own site about it all, but I’ll put the results table here:

It wasn’t always easy to do some of them in floating point in BCPL, and I gave-up with the Byte Sieve one, but do note that 2 were slower in BCPL than in BBC Basic4…

There is a link to the source code on my blog posting for anyone who cares to look.

Which is here:

https://projects.drogon.net/retro-basic-and-bcpl-benchmarks/

Cheers,

-Gordon

mmiler · March 27, 2023, 12:52am

Wow, running BCPL. I only heard about this language about 10 years ago. It seems to have fallen out of use after the 1970s. It was the primary inspiration for the C language (which started out as a subset of BCPL).

I don’t know why, but I am taken with benchmarks on old microcomputers. Every time I see them, I want to try them out on the Atari 8-bit in an emulator.

I primarily use a third party version of Basic for these, called Turbo Basic. It’s an extension to the Atari Basic language, and runs 2-3x faster.

I was a bit puzzled by your results, because I saw you were running BBC Basic. I thought that meant you were running the test on a real BBC Micro, but then I read more carefully.

A college friend used to use an Apple IIGS (a 65816 machine). What I recall is it was able to run 6502 code quite a bit faster than on an older Apple II (though, I think it was also possible to run it at “normal” speed). That would explain your results.

I tried looking up the benchmarks. The Rugg/Feldman tests on an older Apple II fit within the parameters I expected.

Rugg/Feldman - Wikipedia

Also, here’s the Interface Age link to the Prime Cruncher (I couldn’t get yours to work).

Just for the fun of it, here are the results I got in Turbo Basic (labeled “Turbo”), compared to some computers of Atari’s era. Turbo Basic has a compiler. So, I tried that against the interpreted results, labeled “Comp”.

      Turbo   Comp  Apple II  IBM 5100  BBC Micro  Atari Basic  C-64  TI-99/4A
R/F1:  0.8    0.5    1.3       4.5       0.8        2.2          1.2    2.9
R/F2:  2.9    0.6    8.5      21.1       3.1        7.3          9.3    8.8
R/F3:  8.0    3.0   16.0      57.4       8.1       19.7         17.6   22.8
R/F4:  8.8    2.7   17.8      54.5       8.7       24.1         19.5   24.5
R/F5: 10.1    2.9   19.1      59.0       9.0       26.3         21.0   26.1
R/F6: 15.8    6.1   28.6      88.2      13.9       40.3         29.5   61.6
R/F7: 26.4   11.2   44.8     174.9      21.1       60.1         47.5   84.4
R/F8:  6.0    5.2   55.5     201.6      49.9                   119.3  382.0

                    Turbo    Comp
Byte Sieve:          74.1    25.9
Noel's Retro Lab:    23.8     8.6
Prime Cruncher:     580.5   244.7
Mandelbrot (64x48): 595.0   247.8
Mandelbrot (36x22): 155.8    64.8

Apple II result for Prime Cruncher: 960.0

The IBM 5100 results (a computer from 1975) were kinda interesting.

With Mandelbrot, I ran it with your parameters (64x48), just to get the time. For the heck of it, I tried it using 36x22, since that would fit on a 40-column screen.

I found the results I was getting with Turbo Basic were about comparable to BBC Basic.

I don’t know if this was common to microcomputer Basics, but something I heard about recently that caused Atari Basic to be so slow was that every time your program branched to a location, whether using goto, gosub, or a for-loop, the interpreter had to start from the beginning of your program and search sequentially for the target location. So, loops you wanted to execute fast needed to be at the top. This is one of the things that Turbo Basic optimized (not sure how).

What I’ve found interesting about this is these benchmarks are much more a test of the Basic implementations than they are of the hardware, because a lot of the hardware is comparable. A more efficient Basic implementation dramatically improves the results.

NoLand · March 27, 2023, 2:38am

Generally, this is true (at least for all BASICs I do know): jump targets are searched top down and, e.g., putting often used subroutines in front speeds up things. (Some BASICs have optimizations, like comparing the jump target to the current line number and searching from the current line down, if it happens to be greater than this.)
However, this shouldn’t affect loops, where the key properties like the beginning of a loop in memory, loop variable, increment and max. increment are normally put on the stack. So, unless you have jumps inside this loop, at which line number a loop is located shouldn’t affect performance.
I don’t know about the intrinsics of 8-bit Atari BASIC, which may be different, but this seems to be the general pattern.

Edit: Yes, Atari’s BASIC is different, as it apparently has to search for the FOR on any iteration.

drogon · March 27, 2023, 7:20am

Thanks for looking.

I’ll check my links, etc.

The 65816 has a 6502 emulation mode where it’s essentially a 65C02, so will run all 6502 code that doesn’t use illegal opcodes, so from power on, the Apple //gs could run Applesoft, etc. The //gs was not very fast though - search for the history and conspiracy theorys behind that …

And yes, BBC Basic - I wrote my own mostly Acorn MOS compatible OS for my little system, just so I could run BBC Basic (and initially the 16-bit BCPL in the 65C02 version)

And BCPL? Many reasons, but my aim was for a standalone retro system capable of compiling a high level language and as I’m not a fan of Froth there was little other choice - no C compiler I could port (same for Pascal) but BCPL was relatively easy and I’d used it a lot in the early/mid 80’s.

As for the benchmarking - what’s clear (to me, anyway) is that really the crux of them all is floating point performance. If someone had realised that in the mid-late 70s and written some really high speed FP routines for their BASIC or some really fast Integer ones and persuaded people to use them for e.g. loops then they may well have taken the market…

Cheers,

-Gordon

HankG · March 27, 2023, 4:59pm

Back when all the AGC stuff was happening I wanted to do a set of benchmarks of Applesoft BASIC versus cc65 compiled code versus 6502 assembly code. Then end goal was to ultimately recode it in the AGC assembly to get a performance comparison. I didn’t make it too far in that. This is making me want to try out that concept again but probably less ambitiously just compare Applesoft BASIC, cc65, and 6502 assembly.

drogon · March 27, 2023, 7:28pm

Apollo Guidance Computer?

You’ll need to know exactly what you want to achieve and from what I’ve done in the past, it’s never easy.

So…

Applesoft - 5-byte floats, easy to use but slow.
cc65 - 2-byte ints, no floats, cross compiling, but … “C” …
ASM - whatever you want, but be prepared to write shed-loads of it, expect more bugs and for the task to take longer, but when it runs it’ll be the fastest…

My BCPL implementation has 32-bit ints and 32-bit IEEE754 floats. Generally faster than BBC Basic (which is almost 2x faster than Applesoft).

Maybe I ought to do some tests against cc65. I have written several programs in cc65 for my Ruby boards and have a dedicated target for it, so why not… It just needs a bit more time… So I’m not quite going to say “hold my beer” but I’ll get round to it soon…

-Gordon

oldben · March 27, 2023, 10:34pm

With a good 16 bit cpu, you have the choice of using 32 bit float
or 48 bit float (31 bit mantissa, 8 bit sign, 8 bit exponent). The IBM 1130
had similar 48 bit format. The IBM 1130 was 16 bits, but you could use the
AC & MQ as 32 bit acumulator. It really boils down how good your hardware handles
32 bits or bigger.

scruss · March 27, 2023, 11:25pm

They both have 2 MHz 6502s, but the Atari has a few more interrupts going on from the custom chips than the BBC Micro. The BBC Micro also has faster RAM. Turbo BASIC is rather neat. Frank Ostrowski went on to write the monstrously fast GFA BASIC for the Atari ST.

Some people did, but not every computer got to every market in the world. The 8080-based DAI supported the AMD/Intel 8-bit floating point coprocessor, which made for remarkably good performance. Only thing is, the DAI didn’t sell in large numbers.