Microcomputers with built-in compilers?

Well… From a “cheapness” point of view, it doesn’t matter as that test would be made at tokenisation time, so who cares how long that takes (within reason).

What I was after was run-time speed, so I was going to convert the number no matter what and replace it with a token that was an index to the number in the symbol table.

I could do something like what you suggest, but that would complicate the tokenisation code (read number, convert to binary, convert back to ASCII, compare strings, insert a token, or leave the number) and complicate the run-time code (use token to lookup binary value or do the conversion again - slow) so I just tokenised everything. Literally everything - even e.g. REM statements. A tokenised line of code is simply a stream of tokens (they’re 32-bit as my target system was 32-bit) it’s sort of like a one-pass compiler, compiling to a code for some virtual machine which is then interpreted at run-time. (and de-compiled at list time)

The line:

100 A = 20

becomes 3 x 32-bit tokens, one for a reference to variable A, one for the = and one for 20. 12 bytes vs. 4 (without spaces) so it’s not always the most efficient. (There is also storage needed for the symbol table which in this instance would have 2 entrys, one for the variable A (which contains a double word for the value as well as the textual name and one for the constant 20 as well as it’s textual representation).

I even went as far as to “JIT” the output of the shunting yard algorithm at run-time, so feed expression through the shunter, cache the resulting RPN stack, re-write the tokenised line with a new token to the cached RPN. Obviously made big linear programs slower but anything with a loop/procedure call ran at about double the speed.

-Gordon

1 Like

Are there any interpreters which don’t strip trailing zeroes from constants? The convention of specifying additional decimal places pretty much went for a burton around programming languages.

At least the various flavors of MS BASIC store constants as ASCII strings as-is (including any white space characters or trailing zeros), at the cost of transforming them from ASCII to floating point each time they are encountered. The only optimzation is the transformation of line numbers to binary and the tokenization of keywords and operators. Everything else is stored as-is in ASCII (again, including white space).
This is mirrored by classic optimizations on the programmer’s side, like omitting any white space at the cost of readability or using just the decimal point for zero (as this is traversed fasted by the interpreter.)
There may be other classic BASICs which are more aggressive regarding tokenization, but I’m not familiar with those.

Compare this short example (on a PET, a variant of MS 9-digit BASIC for the 6502):

READY.
10 PRINT  20. 00 * .25

LIST

 10 PRINT  20. 00 * .25
READY.
RUN
 5

And here’s the program as in memory (tokens have the sign-bit set):

0401  15 04               link: $0415 (memory location of next line)
0403  0A 00               line# 10    (16-bit, little endian)
0405  99                  token PRINT
0406  20 20               ascii «  »
0408  32 30 2E 20 30 30   ascii «20. 00» (numeric literals may contain white space!)
040E  20                  ascii « »
040F  AC                  token *
0410  20                  ascii « »
0411  2E 32 35            ascii «.25»
0414  00                  -EOL-
0415  00 00               -EOP- (link = null)

Edit:
Just for fun, let’s compare this to what a compiled version may look like:

xx xx            line link
0A 00            line number (10)

80               token PUT_FP (floating point onto stack)
7F 00 00 00 00   FP:   2.5E-1
80               token PUT_FP
85 20 00 00 00   FP:   2.0E+1
AC               token MULT_FP
99               token PRINT_TOS_FP

00               EOL

00 00            EOF

(It’s not more efficient in memory usage, but much faster. And it doesn’t allow for faithful or even meaningful listings.)

While it’s not exactly BASIC, FoxPro and other xBase DBMS’s store numbers as ASCII the whole time. Obviously this consumes twice as much RAM and disc storage as BCD, but it does ensure typical financial calculations involving decimal numbers are accurate.

1 Like

The Psion Organiser II uses OPL, and that is compiled. It looks a bit like BASIC but isn’t. It compiles to QCode which is executed on a stack. This has advantages, the QCode can be distributed as commercial products, and it also takes less memory. OPL is based on procedures and a procedure is loaded into an execution stack when it is called, and unloaded when it returns, so programs larger than available memory can be executed. There are global variables that exist across nested procedure calls, and local variables that don’t.

3 Likes

I didn’t use the right words for what I meant, as I don’t know what they are. Maybe retained precision?

A half-remembered convention from school was something like: if you wrote a number as 2.50 you were expressing it with two decimal places of precision. If you added it to another number with the same number of decimal places, the result also had to have the same number of decimal places. So 2.50 + 1.50 = 4.00, but 2.50 + 1.5 = 4.0, since one term only has 1 d.p. of precision. There may have been another rule for multiplication involving significant figures, but I don’t recall it.

So, as far as I know, there aren’t any interpreters that keep track of precision. It would be really hard to do with standard floating point.

(I saw an irate poster to a bulletin board a couple of years ago who claimed that Python was wrong for returning 4.0 for 2.50 + 1.50 when it should have returned 4.00. I’m not sure anyone knew what to say.)

Significant figures/digits?

-Gordon

Well, the general answer is, “it depends”.
Take for example MS BASIC 1 / BASIC 80 for the 8080: there are three numeric formats,

  • single precision floating point (±1.0e-64 … ±1.0e62 with 7 significant digits, up to 6 can be displayed with rounding),
  • double precision (±1.0e-64 … ±1.0e62, as well, but with 16 significant digits, up to 15 can be displayed with rounding), and
  • integer (-32768 … 32767)

Constants are classified according to the following heuristics (according to the manual):

  • if it does have more than 7 digits, it’s double precision
  • if it is an integer in the range of -32768 … 32767, it’s an integer
  • if it is neither, then it’s single precision

(There are of course various exceptions to this rule. E.g., the flavor of BASIC 1 used in the NEC PC-8201, “N82-BASIC”, which is otherwise closely related to the ROM of the TRS-80 Model 100, uses a single precision representation of -1.70141E+38 … +1.70141E+38 with valid exponents in the range of 01…38. In N82-BASIC, double precision uses a ‘D’ to mark the exponent, ‘E’ being reserved for single precision.)

The 6502 flavor originally used 6 significant digits (I think), the well known Commodore flavor 9 digits (hence “9-digit BASIC”), and converts anything to a single floating point format. (Integer exists only as a storage format and there’s no single or double precision classification for reals.)

As for printing, Commodore BASIC has the following rules:

  • if 0.01 <= |n| <= 999999999, convert to finite representation (omitting any leading zero for reals),
  • else convert to scientific format with up to 9 significant digits.

E.g.,

### COMMODORE BASIC ###

 15359 BYTES FREE

READY.
? 0.01
 .01

READY.
? 0.001
 1E-03

READY.
? 999999999
 999999999

READY.
? 1000000000
 1E+09

READY.
? -0.1
-.1

READY.
1 Like