I am reading the 1986 Adamson book on Sinclair, and they talk about the development of Sinclair BASIC. In it they mention that the version for the Spectrum was “depressingly slow” and “snail like”.
I’m aware of the issues on the ZX81 related to the CPU driving the display, and the use of the FAST command that turned off the display. However, I was under the impression (not having used one) that the Spectrum avoided the worst of this effect and would thus not suffer (as much?) from this effect.
So is the performance of the underlying interpreter (runtime) slow, or just the machine itself? I ask, because when I explored the BBC interpreter, which is by far the fastest in relative terms at least, I found the code was far less advanced than I suspected and in fact most of its performance came from the faster RAM. Moreover, one of the other famously slow BASIC’s was Ataris, which, like Sinclair, was fully tokenized. These, in theory at least, should run faster than MS-a-likes, but these both run much slower. This is also quite interesting.
For those more familiar with the Spectrum, does anyone know of an article, or more than one, that discusses BASIC’s performance? It was not a major system on this side of the pond and I really don’t have a good place to start.
I second, this may be interesting.
(Sinclairs weren’t that much a thing, where I live, either, as this was mostly dominated by Commodore. But I don’t think that it should have been owed to the hardware, though.)
Regarding tokenizing, as far as I would know (see above), Sinclair BASIC already tokenized in the line editor (which is, why there is the modal cursor and command shortcuts can’t be replaced by typed text).
One thing I think I know: the square root routine is a marvel of size optimisation - from a previous thread which leads to a listing, and a post which mentions amusing bugs:
From the PDF:
Overall the 16K monitor program offers an extremely wide range of different BASIC commands and functions. The programmers have always however been short of ‘room’ and hence the program is written for ‘compactness’ rather than ‘speed’.
Edit: I note that literal numbers in the Basic source are invisibly annotated with their binary numeric representation, so there should be a performance gain from that.
I haven’t dug one up so far. Sinclair BASIC was so slow that most programmers made the switch to assembly language. Kevin Toms’ Football Manager was one of the few successful games written in Sinclair BASIC.
The two most given reasons for its slowness are:
The original ZX80 Sinclair BASIC was brutally optimized for size in order to fit in that tiny computer’s 4K ROM. Lack of speed was not a design concern. Later Sinclair computers had much more ROM, but the core of Sinclair BASIC was still the tiny, slow interpreter for the ZX80. Clive Sinclair could have paid the BASIC interpreter developers more money to produce revised, faster code, but (like Jack Tramiel) paying more for something that already kinda worked wasn’t Sir Clive’s style.
The tokenized source on the ZX Spectrum run from what’s known as “contended” RAM. While the ZX Spectrum had a ULA that managed screen refresh, the CPU’s memory reads from the contended RAM area were paused while the ULA did its screen refresh thing. This wasn’t the main source of slowness, but didn’t help.
(The Sinclair ZX81 was much simpler with the CPU being dedicated to screen refresh. In Sinclair BASIC for that machine, you could invoke SLOW to get screen updates at the expense of interpretation speed, or run in FAST mode that wouldn’t update the screen as BASIC interpreted the program.)
Just for a bit of context, and not to detract from the question - why was the performance unimpressive - some notes:
The Adamson book is Sinclair and the ‘Sunrise’ Technology and you can read a bit here, including the two comments about performance.
Various benchmarks do indeed seem to show that the Spectrum lags the C64 (same memory bandwidth, different CPU and Basic) and also the Amstrad CPC (same CPU, similar speed, different Basic) and MSX (ditto)
Atari BASIC and Integer BASIC are two additional examples. One was slow, the other fast. Another, but I have never seen it in the wild, was BASIC/09, which was apparently quite quick indeed.
If you had asked me if pre-tokenization of numbers and variable pointers and such was likely to lead to a faster version of BASIC, I’d have answered “yes”. However, of the four versions that I know worked this way, two were very slow and two were very fast. So it seems it is less important than, say, the math library.
Sure, but I can’t imagine something like looking up a line number is going to be much slower if its unoptimized. Sure, maybe you skip the MS forward/back optimization, but that’s not the really big difference these reviews seem to suggest.
For the most part, once tokenized the runtime should be faster. At a minimum, you’re not re-parsing all the line numbers and constants. I don’t know how Sinclair did it, but in Atari you’re also hard-coding in the variable addresses, so no searching for those either.
So I suspect the performance drop is somewhere else, like the math or such, as it was on the Atari.
The contended RAM issue was also an issue for most designs, including the Atari, and to a lesser degree, C64. Or is this something different?
I found the Computing Today article on archive.org… wow, they’re not kidding about slow! They used the rugg/feldman series - test 1 got 4.88 seconds, whereas a C64 is 1.2 and an Apple II with AppleSoft is 1.3. Even the generally lamentable Atari is 2.2!
On BM7, the one with gosubs and “manual” loops, the Speccy is 89.96, while the C64 is 47.5 and Atari is 60.1.
On the other hand, BM8, which adds trig, gets 25.07… which was a mystery until I read the text beside it and learned they reduced the number of loops to 100 from the normal 1000, meaning it would actually take 250 seconds compared to the C64 at 119.3.
These numbers are very similar to the ZX81, which is not surprising given how few differences there would be in the core code.
I’m going to keep digging, but my suspicion is currently on the math. The easy way to test that would be to compare the ZX80 to 81, as the later added the FP.
Certainly could be some blame attached to the maths routines. However, I’d also look at the cost of line-searching (used for GOSUB, GOTO, possibly more) - see LINE-ADDR, CP-LINES, NEXT-ONE routines from page 94 (84) of the Complete Spectrum ROM PDF linked upthread.
I think Sinclair BASIC does exactly that: re-parsing all the non-keyword content. Every time. I also suspect it starts GO TO/GO SUB processing from the first line, too.
To check where the slow points are, I ran my bench64 benchmark on emulated ZX Spectrum 48K computers:
one running Sinclair BASIC, which managed an overall index of 29;
one running Beta Basic, a compatible but faster 3rd party implementation, which managed an overall index of 58
These overall numbers are less interesting than the individual results:
Each index can be thought of as roughly the %age of the speed of BASIC on a C64. Sinclair BASIC’s FOR, GO TO and GO SUB are particularly dismal, running at less than ⅕ the speed of C64 BASIC. Sinclair’s FOR is also ⅓ the speed of a C64, and there are all keywords that Beta BASIC optimized.
Since Sinclair BASIC was designed for tiny programs that could fit in 1K of RAM, the slow looping, navigation and branching wasn’t a big deal. For most computers, a Sinclair computer was the first one they ever used, so they had no previous experience to compare it against.
I wonder how TI-99/4A BASIC compares. That was shockingly slow. Even just printing a string was insanely slow.
With this benchmark, the TI-99/4A is almost as slow as the ZX Spectrum, but this benchmark doesn’t involve the insanely slow printing of the TI.
Pixel Pedant implemented a dungeon crawling game in TI BASIC. It involved a lot of very counterintuitive and necessary optimization, to deal with the many things the TI does stupidly slowly, and take advantage of the very few things it does more quickly.
Hmmm, so maybe I have this confused. Is it not Sinclair BASIC that converts the number into the internal binary form and then saves the original ASCII as well for LIST? Or am I confusing that with another dialect?
I guess I have to just dig into some code - but I’m even less familiar with Z80 assembler than 6502…
That would not effect the Rugg/Feldman tests though, as the branches are backward. Yet the for-next is 5x slower on the Speccy than the ZX80. That does seem to point to the math.
MS BASIC does this as well, re-parsing all non-keyword content. It has an optimization regarding line targets, but this shouldn’t make much of a difference, especially with shorter programs.
GOTO vs stack-based FOR-NEXT also depends on where this is in the program: at the very beginning of the program GOTO may be actually faster. (It seems to be so in “10 PRINT” one-liners.)
Meaning, so far, for a short benchmark program, nothing of this should matter much.
Is Sinclair BASIC storing line numbers as ASCII strings? (This could explain some of the speed difference to the C64 with regard to GOTO/GOSUB, and even FOR-NEXT.)
Having written my own BASIC in recent years and currently investigating some other BASICS, here are some personal thoughts…
In a small BASIC - say 4KB - you re-use every bit of code as often as you can. So you want to GOTO, then you write one bit of code to search for the line number starting at the top. You then use the same code for GOSUB, FOR, LIST n, RUN n and anything else that you possibly can. It’s space efficient if nothing else. Search backwards? Sure, but that’s more code, so lets just re-use the search from the top code…
You do as much of that as you can for a 4K BASIC. It’s not pretty but you get there in the end. In the late 80s and still early 80s, RAM was expensive. Even the BBC Micro in 81/82 only had 32KB of RAM - although it did come with nearly 32K or ROM…
Storing numbers as binary is hard. Doubly so when they’re floating point - it really complicates LIST if nothing else. (you typed in 1000, but LIST gives 1E3 for example) Re-parsing (text) code is the done thing in a 4K BASIC. Even if you can tokenise in a 4K BASIC, that’s great, you just match a one byte tokens than a number of bytes of keyword - that will give you a small gain but it may not be as significant as you might think.
Some of the old 4K BASICs actually implemented a pseudocode - so part of it was an interpreter for that pseudocode (or VM if you like) there is a trade-off between the complexity of the pseudocode vs. the expected outcome. Of-course interpreting this pseudocode also slows things down a little. More trade-offs.
I do think it’s pretty amazing some of the things that were achieved with those 4K BASICs though - the ZX80 beam-raced the video output too…
Apple Integer BASIC was 6KB but included low-res graphics codes for the Apple II GR mode…
As for Sinclair BASIC - I don’t know, i’ve not really looked at it in much detail. I think the people who bought one for under £50 were just really happy to have something that cheap that ran BASIC at the time…
Regarding optimization: The originally Commodore BASIC implementation (“Old ROM” on the PET 2001) had a few shortcomings, like a not really working IEEE bus implementation, which were mostly related to the limited ROM space. The next iteration didn’t only find place for a fully working IEEE implementation, but also some extra space for a ML monitor. (By dropping the monitor, the later VIC and C64 implementations found room to fit any additional color related code.)
Illustrating that there is usually some room for improvement left in your first iteration.
I still haven’t seen anything that would explain the remarkable difference in performance.
E.g., the 6502 BASICs will have to engage in extensive stack activity, as all the registers are used for passing state to subroutines, and, where not, these subroutines will use these registers anyway. So pushing and pulling from the stack is essential. A Z80 implementation, with that greater choice of registers at hand, should be in a slight advantage. This may be compensated by the speedy zero page instructions of the 6502, which effectively make these addresses additional slow registers, but there’s still more need to shuffle things around than on a Z80.
Even things like the specific implementation of jumps in FOR-NEXT or GOTO shouldn’t matter much for short programs, which benchmarks usually are. (Since the search space is rather limited.) There must be some crucial bottleneck, we haven’t seen yet…