Ah, BASIC
For my sins, a few years back I wrote what I considered my “ideal” BASIC - it’s in C and right now intended for a 32 (or 64) bit Unix system with the SDL 1.3 libraries…
However I applied what I felt were some modern techniques to it, so at line entry time, it builds up a symbol table of variables, constants, and - well actually everything, leaving behind a pure tokenised “program”. Nothing in the program data section is what you typed in. In this case, it’s effectively a one-pass compiler. I have considered a further step of translating it into a native assembler, but the 2 platforms it runs under - ARM and x86 are the two modern platforms I’ve never written any assembler for… (Not strictly true as I dabbled in ARM when it first came out, but I’ve really no intention to go back there)
However the principle is sound, I think - right now, the last few benchmarks put it “only” 25-30 times slower than C.
Things like GOTO are fixed-up at run-time, loops too, as well as calls to procedures and functions.
There is also a run-time “just in time” compiler for expressions - so internally it does everything in RPN and uses shunting yard to form the RPN stack - which I then store in the symbol table and replace in the program text with a symbol, so next time that line is executed, it saves the shunting yard time. If I was clever I’d do this at program entry time but I never thought of it at the time. Optimising expressions in a manner like this is possible, but again I don’t do it.
There is a lot of scope for improvement, but I felt I was up against the laws of diminishing returns - effort required vs. the end-result. I think in this case it would actually be better to start again (which I have plans for, but targeting an 8/16 bit platform)
A down-side is that in theory you can run out of RAM - so type in a 100 lines or so, then delete those lines - what happens in my system is that already defined symbols don’t get deleted too - the symbol table just keeps on getting bigger and bigger, however save/new/load the program and it’s recreated at load time. (Ive never had this issue though - possibly because I’m not writing huge programs - and now I use a text editor that’s built into the system that effectively does a LOAD every time you exit the editor back to interactive mode)
I had to jump through a few hoops too - LIST - well, that’s a decompiler. SAVE too - it saves programs as text not tokens, so LOAD effectively re-tokenies it all line by line. One side-effect: constants are tokenised, so there is the possibility of entering A = 3.1415 … This gets tokenised into Variable A, Operator = number 3.1415 - so 3 tokens. However LIST that, and maybe due to floating point rounding, 3.1415 comes out as 3.141499999 which might not be what you expected. So I stored the textual part in the ever growing symbol table too. I’m sure there is a lesson to learn there, but …
Anyway for a retro/historical context… Here I was thinking what I’d done was clever - turns out it was the same as was done for 4K FORTRAN on the PDP-8 in the late 1960’s. The compiler tokenised the source and produced a symbol table, then the run-time interpreted the tokenised source with the symbol table to run your program… Ah well, what goeth round cometh round.
Cheers,
-Gordon