Following this thread with interest - mostly because I wrote my own BASIC about 10 years back, tweaked it a little (a lot!) when the Raspberry Pi came out and it’s still being used by a few people today (there was even a commercial spin-off called FUZE Basic too - am I the only person to have sold a new BASIC in the 00’s ???)
I decided to make a few changes (from a “classic” BASIC) when I was implementing mine - mostly for efficiency and also to make my life easier… It was also (originally) “My” perfect BASIC - a vanity project if you like. I also have the luxury of writing it in C rather than some assembler and designing it to run under a modern OS, I was able to use the operating systems dynamic memory routines rather than try to maintain my own and “garbage collect”. (ie. I use malloc/free)
One change was that I’d not allow multiple statements per line… In the old days we’d cram as many statements per line to make it run faster and to take up less RAM. I didn’t need either those constraints so I dropped that idea. And on the tokenisation front: Everything got tokenised. And I mean everything. Even comments. Numbers were stored in their native binary format. Line numbers are also optional, but are always there, so when you load a line-number-less program it’s loaded starting at line number 1, increment by 1. (There is a comprehensive renumber command).
The LIST command (for those using the traditional interactive line-number interface) then becames a de-compiler. And here is a problem with tokenising and evaluating everything. Consider
10 A = 1e6
and you type LIST and you get
10 A = 1000000.00004
which isn’t what you typed (I’ve made-up the rounding here - you might actually get 1000000 but the important bit is that it’s not 1e6)
So what do you do? Well, RAM is cheap, so I store the textual part of what was typed along with the binary form of the number… Similarly for strings and comments
20 REM This is a comment
30 A$ = "Fred"
that’s replaced with a single token which is an index into the symbol table which contains the textual value of the comment (or string)
I removed any limitation on variable name length. Afterall, it gets replaced by a single token, no matter how long, so:
40 theLoopCounter = 42
is stored as 3 tokens. The first is an index into the symbol table with the variable - that entry contains its name, type and value (and a flag to indicate used or not used). The 2nd token is the = symbol. this is replaced by a token that indicates a function and the last one is a token that represents a constant.
So what I effectively did was to write a one-pass compiler and a virtual machine to execute the resulting stream of fixed-length (32-bit) tokens. There is a run-time fixup which scans the tokenised code for GOTO, GOSUB and calls to functions and procedures to insert the pointers into their entries in the symbol table to point to the line of their target. (Remember putting common functions/subroutines at the start to make them run faster due to a linear search? I didn’t want any of that old nonsense!)
Unlike the traditional BASICs, I never stored the binary form to disk - mostly because as I was developing I was changing the token values on a daily basis and also because the program became 2 parts -the tokenised code and the symbol table and being a lazy programmer, it was just easier to store the textual part. Loading a test 10,000 line program on a 900Mhz Raspberry Pi 1 did not cause any noticeable slowing of the program load and tokenisation as it loaded.
Anyway, I love old BASICs, but a BASIC of the 80’s … That’s (to me) BBC Basic and not MS Basic which is a product of the 70’s. BBC Basic has long variable names (with all characters significant) and a proper integer data type. It was also faster than all other interpreted BASICs of the time (for the same CPU configuration), but it was also a few years after the MS Basics, so they had a lot to learn from.
So, keep going - always nice to see someone elses interpretation and what they do. I think there is still a place for BASIC - especially in the interactive versions, but there is an avalanche of dislike for it these days…
Cheers,
-Gordon