Prime Cruncher - a Basic benchmark from 1980

EdS · December 9, 2022, 2:19pm

I feel sure two-byte pointers would be the way to go, everything deterministic, no hash in sight. No limited cache, no eviction. And if the compilation proceeds as the execution proceeds, no need to worry about the code changing and invalidating anything. (Except of course for self-modifying Basic, which I have seen, in Acorn’s own code…)

NoLand · December 9, 2022, 7:37pm

As a maybe interesting aside, I once found out that, while assigning to variables is pretty expensive (in MS BASIC that is), variable lookup doesn’t actually matter much. The same should be true for looking up jump targets, since it’s both about comparing 2-byte entries in a list at a variable offset.
(I believe, MS BASIC even has some optimization, searching downwards from the current line, if the line number of the target is greater than that of the current line. Otherwise, we have to start at the very beginning, since the list of lines is forward-linked only.)

Edit: Having said that, there are far more BASIC lines to go through than variables in an average program. So line number lookup should be still a concern.
As a basic MS BASIC performance tip: have your subroutines at the very beginning of the program (rather counterintuitively so) and rather jump ahead (into the near vicinity) than backwards. Substitute backward jumps by FOR-NEXT loops, where you can.

MauryMarkowitz · December 20, 2022, 10:25pm

Ah yes, worth mentioning that Atari BASIC was BCD and generally avoided these problems. I assume you also get a little speed boost converting from BCD to ASCII and back compared to binary.

jhi · December 21, 2022, 7:39am

Always been a little mystery to me why BCD sort of fell on the wayside in the programming languages and only integers and floats survived.

oldben · December 21, 2022, 9:12am

It never did really fall on the wayside, as it now back as densly paced decimal.
It it more the case that modern programing languages like C,were developed
on machines that only suported binary floating point, as the basic instruction set.
You wanted to do BCD on the PDP 11, that was micro code upgrade that only
a few models supported, and even then you needed a COBOL compiler.
Had the PC used a IBM 370 cpu, things might have been different. Ben.
(Just my views, I never used a PDP 11, or IBM370, or looked at COBOL compilers)

EdS · December 21, 2022, 11:29am

I suspect the 8087 had something to do with it: binary arithmetic needs less logic to implement, or is slower, so economics delivers binary floating point first, and cheaper.

It feels to me that BCD on computers isn’t really any better - it has different inaccuracies and rounding problems than binary, that’s all. (Excepting possibly the world of finance, where it’s important to get the same answer as a financial calculator, which is a BCD machine. Hence IBM’s support for BCD.)

That said, elsewhere I noted

I was slightly surprised to read these words by William Kahan (via this discussion):

Quote:

“Binary floating-point is noticeably better than decimal for error-analysis so, as an error-analyst concerned with the reliability of scientific, engineering, statistical, financial, economic and medical computing, I should advocate binary for everyone. But decimal floating-point is more humane than binary because everybody learns about decimal in grade school, and that will never change. Attempts to foist binary upon everybody almost always generate Errors Designed Not To Be Found”

jhi · December 21, 2022, 11:50am

BCD is really integer arith when you squint at it right, encoding being of course different.

I wish I had saved a link to I once found that discussed that due to the application area (finance, accounting) of BCD the division operation is really that useful often because of non-terminating approximations. What instead they had was an operation called (I cannot remember the name, sorry), where e.g. doing this “not-division” of 10 by 3 will give you a list of {4, 3, 3} (or a map of {4:1, 3:2}, if you want to be space-efficient). Or e.g. 10 by 7 would be 2:3 1:4.

The idea being that summing the parts back was guaranteed to give you the original, because you NEVER want to lose a single penny. (Who gets the bigger pieces? Depends on the application.)

jhi · December 21, 2022, 12:37pm

Well, it effectively fell on the wayside outside of IBM customers, either or both mainframes/COBOL, no “modern” languages cared.

But there is even a recent IEEE 754 on it: Decimal floating point - Wikipedia

NoLand · December 21, 2022, 6:48pm

The soviet Elektronika calculators come to mind, which are said to have been slow for their use of BCD arithmetic. (Apparently, while these models where built around a PDP-11 on-a-chip and slightly modified DEC-BASIC, there was no micro code option for BCD included.)

I guess, if you want to be fast and avoid (the worst) rounding errors, balanced ternary would be still the way to go.

oldben · December 21, 2022, 9:31pm

I would do a BASE 27. That fits 3 trinary bits, into 5 binary bits. The only thing keeping me
from building trinary computer is what logic operations to use. Boolean logic with don’t cares or a Majority, minority style logic, or a simple brain cell in case I want a real AI.
https://temca2data.org/
Now back to BASIC. I want to divide $1.01 by 3 people. That is 33 cents each and
2 cents left over. 33.66 cents is not the answer with or with out rounding. Ben.

NoLand · December 21, 2022, 11:19pm

On a higher language level, FORTRAN arithmetic IF has its charmes, also for anything that can make use of a sign function. On a low level (AND, OR, XOR, etc), I really don’t know how this could be operated best. I guess the fallback for this would be to use single-trit addition, subtraction, multiplication. (So add without carry, i.e. modulo a single trit, would be a stand-in for XOR?).

Compare, https://en.wikipedia.org/wiki/Balanced_ternary#Addition,_subtraction_and_multiplication_and_division

BTW, ternary SRAM (T-SRAM) seems to be a thing.

Edit: a few sources on ternary logic functions:

Michael_Barry · December 25, 2022, 11:26pm

It’s not quite that simple. In Applesoft the target line number’s high byte must be greater than the current line number’s high byte, or the search starts at the beginning. To illustrate, I just slightly renumbered the original source and reduced the elapsed time from about 986 seconds at 1 MHz to about 946 seconds. I don’t know why the original article claims “960 seconds for a 2 MHz Apple ][+” … that seems wrong on more than one level.

NEW
100  REM INTERFACE AGE’s benchmark program to
110  REM ‘discover’ the first 1000 Prime numbers
120  REM
130  PRINT "Starting:"
140  FOR N = 1 TO 1000
150  FOR K = 2 TO 500
160  LET M = N / K
170  LET L = INT (M)
180  IF L = 0 THEN 270
190  IF L = 1 THEN 260
200  IF M > L THEN 260
210  IF M = L THEN 280
260  NEXT K
270  PRINT ;N;" ";
280  NEXT N
290  PRINT  CHR$ (7)
300  PRINT "Finished."
310  END

If we keep the inefficient algorithm but sacrifice a bit of structure to minimize the number of GOTOs being executed, it’s not as pretty but it is significantly faster, at about 846 seconds:

NEW
100  REM INTERFACE AGE’s benchmark program to
110  REM ‘discover’ the first 1000 Prime numbers
120  REM
130  PRINT "Starting:"
140  FOR N = 1 TO 1000
150  FOR K = 2 TO 500
160 M = N / K
170 L = INT (M)
180  IF L = 1 THEN  NEXT K: GOTO 270
190  IF L = 0 THEN 270
200  IF M > L THEN  NEXT K: GOTO 270
210  IF M = L THEN  NEXT N: GOTO 290
260  NEXT K
270  PRINT N" ";
280  NEXT N
290  PRINT  CHR$ (7)
300  PRINT "Finished."
310  END

MauryMarkowitz · December 28, 2022, 10:10pm

Interesting, I thought the issue with the high byte was limited to the Commodore implementations, I was unaware it was in Applesoft too.

Any ideas why this might exist? Adding the single BEQ does not seem like it would outweigh the advantage, especially in the case for same-line loops.

Something I’m missing here?

MauryMarkowitz · December 28, 2022, 10:13pm

But I think it’s fair to say those problems are more what the user might expect. 1-0.1+0.1 in binary seems rather unexpected, whereas the places you see issues in decimal are the same ones you see in “the real world” to a degree.

Michael_Barry · December 29, 2022, 1:46am

If the high bytes are the same, then the branch could be forward or backwards, so a simple BEQ won’t help much. I’m easily entertained, so I in-lined all of the branch targets, thus eliminating all GOTOs, and got it down to about 870 seconds … WAIT … why did it get slower ??? Did the false IFs slow down because of the extra junk in their trunks? Applesoft shouldn’t be looking for an ELSE because it wouldn’t know what to do if it found one anyway! I guess branch avoidance in Applesoft doesn’t always offer the same benefit as branch avoidance in assembly …

NEW
100  REM INTERFACE AGE’s benchmark program to
110  REM ‘discover’ the first 1000 Prime numbers
120  REM
130  PRINT "Starting:"
140  FOR N = 1 TO 1000
150  FOR K = 2 TO 500
160 M = N / K
170 L = INT (M)
180  IF L = 1 THEN  NEXT : PRINT N" ";: NEXT : PRINT "Finished." CHR$ (7): END 
190  IF L = 0 THEN  PRINT N" ";: NEXT N: PRINT "Finished." CHR$ (7): END 
200  IF M > L THEN  NEXT : PRINT N" ";: NEXT : PRINT "Finished." CHR$ (7): END 
210  IF M = L THEN  NEXT N: PRINT "Finished." CHR$ (7): END

MauryMarkowitz · December 29, 2022, 4:23pm

Well sure, so…

ORIGINAL CODE:

Test greater than. If greater than, branch to start search at current line.

NEW CODE:

Test equals. If equals, test low byte lower. If low is lower, skip next line.

Branch to start search at current line.

BACK TO ORIGINAL CODE:

Branch to start search at top.

For the cost of one extra branch instruction on every BASIC branch, and a compare-branch about 1 in 25 branches, you optimize short forward jumps and super-optimize same-line jumps.

Doing the compare on the entire line number instead of half of it takes the same time as every line number compare otherwise, so this is the equivalent of adding one line of code to the top of your program.

jejones3141 · March 16, 2023, 1:10pm

basic09, the program that you run to create, run, and debug BASIC09 programs, compiles to the same I-code that runb, the program that runs “packed” BASIC09 programs, runs. basic09 keeps around information that lets it regenerate listings (symbol tables, comments, the declarations of TYPEs (BASIC09’s version of C structs)) and lets you list, edit, and debug. The easy way to tell: create a procedure and then list it. basic09’s prettyprinter will capitalize keywords, indent, make case on identifiers consistent (case is insignificant in identifiers, and it will use the case it sees first–that means that it need only keep one copy of identifier names and gives you camelCase without tiring your shift-key finger), and will use minimal parentheses in expressions (I-code uses RPN for expressions, so the parenthesization you entered won’t necessarily appear in the listing, though the semantics will be preserved).

jejones3141 · March 16, 2023, 1:13pm

The comment doesn’t accurately describe what the program does. If it really finds the first 1000 primes, it must look at numbers bigger than 1000. It may find primes between 1 and 1000, though.

EdS · March 16, 2023, 1:28pm

That’s true - a bug in the comment!

scruss · March 16, 2023, 3:34pm

Reverting to integer maths may be considered a no-fair by some.

I was pleasantly surprised how fast my emulated PDP-8 ran this. True, the emulated version is many times faster than the real thing, but I haven’t powered up my PDP-8 on a chip RBC6120 system in a while. The only oddness is that PRINT CHR$(7) merely prints the letter G, since OS-8 wasn’t quite down with that newfangled ASCII nonsense.