Prime Cruncher - a Basic benchmark from 1980

I’ve another data point: Commodore PET 2001 (ROM 2.0, emulation) at a modest 1132 seconds (18 m 52 s).
While this may be rather depressing, it’s just about 120% of the Beeb, taking the 1Mhz clock speed of the PET’s 6502 into account (versus 2MHz on the Beeb).

Have a nice cup of tea (or two) and try it here:wink:

1 Like

In the same vein:

1 Like

About the TRS-80 being the most produced computer by 1980, that would be the Model I. I don’t think we can lump the Model II that was used in that benchmark with the I, though the Model III can certainly be considered a version of the I.

1 Like

(That nice graphic, and many more, can be seen in the 10 part article Total share: 30 years of personal computer market share figures)

My Cambridge Z88 is under 11 minutes.

I did have a thought: those early microcomputer Basics were written for small ROMs and interpreted programs in small RAMs. (ZX80 had 1k RAM and 4k ROM as an extreme example.) So, when machines with 32k, 48k, nearly 64k of RAM came along, with ROMs of 16k or more, one wonders why a little RAM wasn’t used in some tradeoffs for performance: in particular, for this case, speeding up GOTO references. (Other tradeoffs would include precalculating the binary values of constants, even perhaps pre-digesting expressions for evaluation, perhaps speeding up variable lookup.)

Edit: see this comment elsewhere and the thread below it…

On most machines, I suspect the video buffer was placed just below the roms,
and you can’t move that for more rom. As for slower basic runtimes are we
compareing the same floating point formats? A spread sheet would work better
with a bcd floating point, than a faster binary one.
Ben.

Binary constants versus literals would be huge. Having to parse a number each time it’s accessed is very expensive. However, there’s what may be called “literal value fetishism” going on in BASIC. Users expect that when they enter A + 1 it will also list as this and not as A + 0.1E1 (or whatever). You could probably insert a few extra bytes before the constant in the BASIC text (e.g., a binary constant marker including a format mark, 2 or 4 bytes for the binary value, and another byte for the skip offset, so you wouldn’t have to parse the constant again to find its end). Optimizations for GOTO may be also doable, but would require about the same effort as renumbering the entire program on each edit (which may be quite unbearable with longer programs).

Another quite remarkable factor is white space parsing, esp. in MS BASIC. (The performance difference with/without white space is quite remarkable.)
I guess, for a high performance, but still convenient version of BASIC, there’s nothing short of retokenizing the entire program in a close-to-binary shadow version at program start (RUN), in other terms, precompiling, like with Perl, etc. But this should be adverse to the dialog-like philosophy, which is crucial to BASiC. In other words, BASIC is slow, because it allows its users to add or re-edit program text anytime without re-evaluting the entire program. (Hence, there is no consistent structure in memory, lines may be linked lists without consistent sequence, there may be even dead code, etc.)

(P.S.: I really don’t know the internals of Atari BASIC. The description reads interestingly, but, I guess, there must be some trade-offs, as well.)

1 Like

Yes, for me it would be important for LIST always to return the exact typed text. Perhaps I’m thinking along the lines of storing a parsed version of each line - perhaps on the heap - the first time it’s encountered. There’d be room in that parsed version for storing pointers to lines which are mentioned as line numbers - but perhaps those pointers need only be set when that line number is first used as a target. That would avoid searching the program text, I think, other than the first time it needs to be searched for execution.

BASIC 09 for the 6809 under O/S9 was a nice BASIC. You could compile your program
for speed or just run the tokenized version.

This.

When I wrote my own BASIC interpreter I tokenised everything. Numeric constants were the biggest hassle, so they were converted into binary, stored as a token in the code - the token is a token value that says “constant” plus an index into the symbol table - the entry in the symbol table contains the string representation of the number as well as the binary value.

It was cumbersome and made LIST perform precisely.

Also only relevant for people using the traditional line-number entry system - move over to the editor and it’s plain text all the way (but the conversion still applies as the system reads your edited text file to convert to internal format - it’s effectively a sort of one-pass compiler with a de-compiler).

But it was aimed at a modern Linux PC, not a memory constrained 8-bit micro, so I got away with it.

-Gordon

1 Like

Storing a fully tokenized/binary version of a line just-in-time on the heap as we go along has its charms. But we’d need some modern variable management with hashes – and garbage collection may be an issue.
But GOTO/GOSUB is an absolute show-stopper. Consider the following lines:

80 H = H/C : GOSUB 2500 : A=A+1 : B=B*2 : C=(A+B)/2
90 IF C =< 100 THEN GOTO 120 ELSE PRINT A
100 ON X GOTO 200, 220, 250

In line 80, we’d never come to store the entire tokenized line, because we exit on GOSUB, leaving the rest untokenized, process the subroutine, and as we return, we’ve probably exhausted our buffers. Quite the same with any IF-ELSE clauses (if our dialect supports this) as in line 90. We may introduce a restriction that any jumps must be the very last statement in a line (similar to strict tail recursion), but this would put any kind of ON… statements (as in line 100) out of question.

A double-headed BASIC, which supports both an interpreted and a compiled mode, as suggested by @oldben, may be the best way to go.

Edit: Maybe, the GOTO problem could be overcome with linked lists, like with text editors, so that any jump would mark the end of an internal “paragraph” and the code must link to another list entry after this. But, if we use this in a dialog-like fashion, our heap will become a mess pretty soon. (Meaning, there had to be a hash list, linking to line entries, which are a linked list, each. I’m not sure how feasable this would be with a 16-bit address space, with ROM, hardware interfaces, etc, included. This leaves about 32-36 K for BASIC text, heap, and any variable stores and we’d run out of resources with anything larger than 8K of program text pretty soon.)

Been there, not in actual code, but in drafting mode. I couldn’t see, how this would be manageable.
(That is, you could store the literal constant along with the binary representation, but then you would still have white space and style related issues. It just wont be BASIC.)

Now, here’s my own take on this: Have a normal, tokenized BASIC, but with jump vectors for constant parsing and variable lookup routines and a mode marker at the very beginning of the program. Now we may optionally save a program in a compressed (white space excluded) and fully tokenized format. With regard to the mode mark, we’d now use binary constants and indices into a variable store instead of names. We could even list the binary format (with some caveats.) And this should be a rather simple modification, too. Leaving GOTO/GOSUB, which may be sped up by using doubly-linked lists for the lines. (So we may have an educated guess, from which end to tackle our search best.)
Considering all, anything more fancy is probably more costly than it may be worth.

*) Edit: Rather obviously, the other, default mode is interpreted BASIC as usual.

I suppose one question would be: how large is the text of a Basic program, and how much additional memory is needed by the program as it runs?

Obviously, programs can be as large and as hungry as we like - and we know Acorn pursued, in sequence, HIBASIC for a 64k second processor, BAS128 to use 64k of mapped RAM on the Master, and the in-house 256k Turbo machine (which might have had a Basic angle?)

But any program written for a machine with say 16k RAM would have lots of room for a semi-compiled data structure on a machine with say 32k or more RAM - wouldn’t it?

That’s a nice idea - it also allows for a program so large that it can’t be compiled but can still be interpreted.

The whitespace issue is easy - you remove them all once you’ve tokenised the line. Applesoft (and I suspect other MS derived BASICs) does this. (Obviously you preserve spaces in strings and comments. This makes for some odd listings in Applesoft - e.g. if you don’t use LET then that line is one space to the left of the lines above/below if they do use LET or start with another keyword. It’s just one of those things we got used to/ignored back in the 70’s …

When I started mine, I thought I could simply strip spaces then “as if by magic” have variables names with spaces. Not being an expert in parsing, etc. this was going to take more of my time than I wanted it to, so I quietly dropped that idea. I still strip spaces once the line is tokenised, so the LIST function lists the program as I want it listed (complete with keywords capitalised as I want them capitalised, indentation as I want it indented and so on. If others don’t like this, then they can use the screen editor which is just a simple nano-like text editor).

-Gordon

True, you should be fine with double the RAM. I’m a bit worried about the performance of line look-up. Say, we have a hash list with 8-bit hashes and named arrays for lines with hash-collissions, where each entry points to a binary line on the heap. We’d had to go through the process of hashing and probably still comparing a few literals for each single line executed. I’m not sure, if this wouldn’t be performing worse for normal execution, even offsetting any other gains.

(E.g., MS BASIC with line numbers stored as binary 2-byte integers and storing lines as a linked list should be pretty performant even when searching for lines as a jump target, since the comparison is trivial. Much like variable lookup, actually. And advancing to the next line in normal flow is more than trivial, since there is already a binary pointer with the very address in memory.)

I feel sure two-byte pointers would be the way to go, everything deterministic, no hash in sight. No limited cache, no eviction. And if the compilation proceeds as the execution proceeds, no need to worry about the code changing and invalidating anything. (Except of course for self-modifying Basic, which I have seen, in Acorn’s own code…)

As a maybe interesting aside, I once found out that, while assigning to variables is pretty expensive (in MS BASIC that is), variable lookup doesn’t actually matter much. The same should be true for looking up jump targets, since it’s both about comparing 2-byte entries in a list at a variable offset.
(I believe, MS BASIC even has some optimization, searching downwards from the current line, if the line number of the target is greater than that of the current line. Otherwise, we have to start at the very beginning, since the list of lines is forward-linked only.)

Edit: Having said that, there are far more BASIC lines to go through than variables in an average program. So line number lookup should be still a concern.
As a basic MS BASIC performance tip: have your subroutines at the very beginning of the program (rather counterintuitively so) and rather jump ahead (into the near vicinity) than backwards. Substitute backward jumps by FOR-NEXT loops, where you can.

2 Likes

Ah yes, worth mentioning that Atari BASIC was BCD and generally avoided these problems. I assume you also get a little speed boost converting from BCD to ASCII and back compared to binary.

1 Like