Modern retro: thoughts on Basic interpreters/compilers

jecel · April 19, 2019, 1:07am

One interesting modern retro project would not be hardware (though it could be part of a project with new hardware) but an incremental BASIC compiler. The idea is that as soon as you typed in a BASIC line it would be translated to machine code and patched into the existing program. Two helper tables would be needed to translate variable names to/from memory addresses and line numbers to/from memory addresses. When you LIST the program or a line the machine code would get decompiled on the fly.

Such a system would easily allow embedded assembly language code. A line would just have a single assembly instruction instead of a BASIC statement. Note that this is different from BASIC/assembly schemes like BBC Basic which are just a clean way of doing the old “poke bytes from DATA statements into a string and call it with USR”.

The main complication of this idea is dealing with incomplete or buggy programs. A GOTO to a line that doesn’t exist (yet) or a FOR with no NEXT. One way to do this would be to keep a list of things to fix and it might even be interesting to allow the programmer to see this list.

drogon · April 19, 2019, 10:53am

Ah, BASIC

For my sins, a few years back I wrote what I considered my “ideal” BASIC - it’s in C and right now intended for a 32 (or 64) bit Unix system with the SDL 1.3 libraries…

However I applied what I felt were some modern techniques to it, so at line entry time, it builds up a symbol table of variables, constants, and - well actually everything, leaving behind a pure tokenised “program”. Nothing in the program data section is what you typed in. In this case, it’s effectively a one-pass compiler. I have considered a further step of translating it into a native assembler, but the 2 platforms it runs under - ARM and x86 are the two modern platforms I’ve never written any assembler for… (Not strictly true as I dabbled in ARM when it first came out, but I’ve really no intention to go back there)

However the principle is sound, I think - right now, the last few benchmarks put it “only” 25-30 times slower than C.

Things like GOTO are fixed-up at run-time, loops too, as well as calls to procedures and functions.

There is also a run-time “just in time” compiler for expressions - so internally it does everything in RPN and uses shunting yard to form the RPN stack - which I then store in the symbol table and replace in the program text with a symbol, so next time that line is executed, it saves the shunting yard time. If I was clever I’d do this at program entry time but I never thought of it at the time. Optimising expressions in a manner like this is possible, but again I don’t do it.

There is a lot of scope for improvement, but I felt I was up against the laws of diminishing returns - effort required vs. the end-result. I think in this case it would actually be better to start again (which I have plans for, but targeting an 8/16 bit platform)

A down-side is that in theory you can run out of RAM - so type in a 100 lines or so, then delete those lines - what happens in my system is that already defined symbols don’t get deleted too - the symbol table just keeps on getting bigger and bigger, however save/new/load the program and it’s recreated at load time. (Ive never had this issue though - possibly because I’m not writing huge programs - and now I use a text editor that’s built into the system that effectively does a LOAD every time you exit the editor back to interactive mode)

I had to jump through a few hoops too - LIST - well, that’s a decompiler. SAVE too - it saves programs as text not tokens, so LOAD effectively re-tokenies it all line by line. One side-effect: constants are tokenised, so there is the possibility of entering A = 3.1415 … This gets tokenised into Variable A, Operator = number 3.1415 - so 3 tokens. However LIST that, and maybe due to floating point rounding, 3.1415 comes out as 3.141499999 which might not be what you expected. So I stored the textual part in the ever growing symbol table too. I’m sure there is a lesson to learn there, but …

Anyway for a retro/historical context… Here I was thinking what I’d done was clever - turns out it was the same as was done for 4K FORTRAN on the PDP-8 in the late 1960’s. The compiler tokenised the source and produced a symbol table, then the run-time interpreted the tokenised source with the symbol table to run your program… Ah well, what goeth round cometh round.

Cheers,

-Gordon

jecel · April 19, 2019, 5:33pm

That sounds like a great project! I was thinking that since you have to have a garbage collector for strings anyway you might be able to reuse it to clean up the symbol table.

My goal in a project like I described would be to see how little RAM and ROM would be needed and how that would compare with a traditional tokenized interpreter. To me, the interesting part of modern retro is to see if we learned something - if with the same resources as we had then would could do better with what we know now. In that sense, Babbage’s Difference Engine 2 was modern retro

whartung · April 19, 2019, 5:36pm

This is an interesting idea. Nowadays the hot tip would be to JIT compile it while it’s running, but that’s a different problem, and I think that while you may lose some global optimizations, compile on the fly has some merits, especially on mid range hardware (i.e. 10+MHz machines).

I would not decompile the machine code to generate the listing. I would just have the listing separate and part of memory. The incentive to “decompile” is much less today, memory is much more abundant.

Early BASIC-PLUS on the PDP was exactly this, a compiled on the fly BASIC, it just didn’t compile in to machine code, it compiled in to a byte code for a threaded interpreter. But it still kept the source code in RAM rather than decompiling it for LIST. They may well have went to the threaded interpreter rather than machine language simply for code density.

Also, once you get the listing in memory, then your UI now has trivial access to the entirety of the source code which can be nice for editing purposes (though the PDP did not leverage this – full screen editors aren’t nice things at 300 baud).

whartung · April 19, 2019, 5:40pm

The classic tokenized BASIC is pretty RAM efficient, I don’t think you can get much better. You have the source code in a compact format suitable for execution and editing. A machine language compiler would take more runtime memory and be a more complex runtime, since you need ROM space for the actual compiler.

In the end, you always need the parser and runtime foundation. The compiler would be ROM space on top of that to make a less RAM efficient memory version of the program.

With all of the history, if a compile on the fly BASIC would have been “better” in the constrained memory and CPU environments, we’d have seen one back in the day.

drogon · April 19, 2019, 5:55pm

Er, well, I don’t have a garbage collector - mostly because I wrote this under Linux and didn’t treat it like an old school micro where you have to manage RAM - I was lazy and let malloc() & free() manage it for me… I quietly forgot about it when I moved to using a screen editor for it - switching between edit mode (which reads in a text file from disk) and run-mode (which saves the text file to disk, then does a NEW+LOAD which reads it in deom disk and tokenises it all from scratch) happens in under a second on a Pi for a few 100 line program, so it didn’t seem to matter at the time…

Tokenising BASICs essentially de-compile for LIST - the ultimate in that department (I think) was not BASIC, but Forth in the Jupiter Ace - the down-side is that you then get the code layout, indentation, etc. that the implementer forces on you…

-Gordon

drogon · April 19, 2019, 6:06pm

Just continuing this with your 2nd half… What did I learn in the “what we know now” department…

Don’t do it again! The world doesn’t really need another BASIC…
Be lazy - let the OS do as much as it can for you (e.g. malloc/free)
Keep adding in keywords until someone mentions the phrase: “Namespace clash” then tell them in no uncertain terms you do not want colons, dots, square brackets or anything else weird in function names… (This is BASIC, damit!
I learned that the old guys had it hard. Cross compiling/assembling, limited tools, version control? Pressure to market and so on.

That last one, more-so on even older kit - I’ve done some PDP-8 tinkering in the past couple of years and wondered if I could come up with an OS better than OS/8 … Nope. I gave up. Not in a sensible time-scale though. And even now, I’m working on my 8/16 bit retro project - and I’m finding it harder to come up with something more usable than the Acorn MOS of the early 80’s. I know there are other projects that have produced some very modern looking systems, but I feel there’s a balance - do I dedicate 64K of my… 60K of RAM to the OS? Where do you draw the line…

As for how little RAM, ROM… Look at the humble ZX80. 4K ROM, 1K RAM which included the screen memory AND code to do the video generation too… Really not sure how to beat that…

-Gordon

whartung · April 19, 2019, 8:21pm

Today, BASIC is a four letter word in computing. But what you mention above is why it was a “right place, right time” fit for computing back in the day.

The dark side of retro computing is when you actually get your hands on a piece of vintage hardware and discover just how dreadfully slow everything is. Looking back, I boggle that I was able to accomplish anything. It’s amazing anyone accomplished anything.

Which is why BASIC was amazing. By crushing the edit/compile/run cycle using an interpreted language, people could actually get something accomplished. Sure, the code was slower than it could be. But when you’re doing data processing, and the code is waiting on glacial hardware, the speed of a FOR loop is really secondary. And, besides, we’re used to just firing off a report and letting it show up on the printer minutes or hours later. Once you’ve consciously cast off the process to the equivalent of modern mail order, who cares how long it takes.

After that, it was just momentum that kept it going forward even as technology made decisively better languages and environments useable.

BASIC empowered a lot of people to get stuff done.

jecel · April 19, 2019, 10:40pm

That was indeed very impressive, specially when you consider that it only used technology that had been available for years (Don Lancaster’s “The Cheap Video Cookbook” had come out in 1978) but nobody else had done it.

VTL-2 (Very Tiny Language -see its manual) only took up three PROM sockets on the Altair 680, which is a total of 768 bytes. But it used a serial terminal instead of implementing video.

About learning, it is interesting to compare Forth as implemented on the Jupiter Ace, for example, with what Chuck Moore was later able to achieve in Machine Forth and Color Forth.

EdS · April 20, 2019, 7:12am

A very small language! There’s a port to the 6502 here:

with a long thread here
http://forum.6502.org/viewtopic.php?f=2&t=2612

jhi · April 20, 2019, 7:15pm

Yes, BASIC has a bad reputation. Or should I rather say it had, its heyday was so long ago that the bad reputation might have somewhat dissipated: “BASI-what?”. But if you look into it, where did the bad reputation come from? To large part it came from the extremely cramped implementations which dropped many of the features present in the “big computer” implementations, for obvious reasons but horrible results. The profligate use of line numbers and GOTO was not pretty, want some bolognese sauce with that spaghetti? But the Darthmouth implementations continued developing, and companies like DEC and HP made pretty advanced implementations available. (Google up DEC Basic and HP BASIC, and BasicPlus.) Microsoft of course took up Basic and kept it going until C#, but they … made it into their own image, so that those advancements made less sense in other platforms, much like what happened with Pascal morphing into Delphi.

As an example the HP (really, Digital, ahem, really, DEC) Basic manual from 2005: https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04619693

EdS · April 21, 2019, 12:22pm

(zipping back to the original idea: an interactive Basic compiler…)
Some Basics will treat constants and line numbers in a particular way: both in a digested format for easy use by the interpreter, and also in string format for use by LIST or RENUMBER. It feels to me like one could do the same: a Basic line becomes a data structure which includes

a cheap way to get to the next line
machine code, perhaps mostly subroutine calls, to run the line
a string which is the line as entered and as it should be listed

If the line isn’t yet semantically complete, for example referring to a line number which doesn’t yet exist, then the machine code might be incomplete, tagged in some way, or not present. It feels likely, to me, that many lines will need to be completed before they can be executed: allocation of variable space and arrays is usually done at RUN time, and doing it earlier runs into difficulty with the Basic program itself growing and shrinking as it is worked on. Possibly a more modern approach to memory allocation, compared to the highwater mark approach of most Basics, would help. At the cost of garbage collection!

At minimum this approach will use more RAM, and early machines were often very short of usable RAM. But a modern machine would have at least 64k.

jecel · May 6, 2019, 7:55pm

Alan Kay says that on a small scale material dominates while on a large scale architecture dominates. The example he uses is a dog house. You are likely to get it to work no matter what path you select, but the result will probably be rather different depending on whether you use wood, aluminium sheets, plastic, cardboard, bricks or something else.

If you let your success with the dog house go to your head and try to build a three story building by just scaling it up then you will almost certainly fail. Making things 100 times larger means a million times more weight and it will fall down.

Basic is the cardboard and staples of programming languages and a really great way to get things done on machines with 4 to 16KB of memory. The only real alternatives are assembly and Forth which aren’t for everyone. With 32 to 64KB we get more options like C, Pascal, Lisp and others. Beyond that we can have Python or Lua and by the time we move above 1MB we can have any language we want.

It would be possible to use Basic on large machines, but what would be the point? You will be tempted to expand it into something more like Pascal and add object oriented stuff like Visual Basic where there is nothing left of the language but the name.

That said, people still remember the extensions in BBC Basic very fondly. And I really enjoyed what the Extended Basic cartridge for the TI99/4A added over the pathetic built-in Basic. Proper subroutines with arguments really helped.

ArduinoEnigma · May 6, 2019, 10:31pm

I used PowerBasic to create DOS executables and VisualBasic for Windows. When .NET came out I jumped ship to C#

whartung · May 6, 2019, 11:06pm

BASIC was used on large machines. BASIC was used on large projects. Data processing, is data processing. There are bazillions of lines of BASIC that have been written historically, and are being used to this very day.

I’ll never forget talking to this lady at a clothing company extolling the wonders of here entire, custom distribution system written in BASIC running on their monster, 64 CPU Sequent. (At this time, yes, this was a Big Machine). And, yes, were talking basic BASIC. She was quite proud of her convention that Z9 was the error status variable everywhere in her code.

Very sophisticated systems have been written in BASIC. I’ve written some of them.

Interactive resource scheduling, planning, and allocation, in a networked environment, driving a slew of status displays? (Queues, remote processing, best fit heuristics, custom protocols, data driven, on the fly display formatting) Yea, I did that. Seamless integration with the work order system helped a lot (150+ users on a 33MHz 68030 for that system). Pre-TCP/IP, mind. This system drove an entire war room that was the scheduling hub for this company. The President of the company liked to come in and look at the displays and say with a smile “My, aren’t we busy!”. Airport displays had nothing on this thing.

Street level truck routing, scheduling, and pathing? Geo coding, address validation, routing a dozen trucks across 1000 pickups in 20m of runtime? In the early 90’s? Pre-GPS. Yea, I did that too, The map viewer was in C. The rest of the backend was all BASIC. Yea, Google can do this in a nano-second today. Not so much back then.

And make no mistake, outside of longer variable names, this was your typical BASIC, not some fancy BASIC. It had ISAM integration.

BASIC without line numbers is still BASIC. Visual Basic in BASIC. All of that stuff you learned typing games in to your C64 is much more applicable to a VB program than it is to Pascal, C, or anything else. Once you get past the event dispatching and GUI interface it’s still A$ = "Hello " + A$ in the end. DIM B(10). Alan Kay could only dream that Smalltalk were as successful as VB3 or VB6 was.

Would I chose BASIC for a greenfield project today? No. I’m a Java guy. But that doesn’t mean BASIC was not, or IS not a totally successful language that’s let a lot of people empower their businesses and other activities. Which is what this is all about in the end anyway. Empowering people to use their computers to improve their operations. Whether’s its a custom PICK distribution system, or a slew of awful VBA macros in a nightmare Excel spreadsheet.

And speaking of successful, dare we explore how many billions of lines of RPG there are floating around out there? And how successful those information systems based upon that relic are?

jhi · May 7, 2019, 3:08am

The extremely cut down versions of Basic crammed into the few kB of microcomputers were not the same language as was available in minicomputers, or in more advanced micros like Sinclair QL. The more extended versions were well on par with Pascal and other contenders of the time.

EdS · May 7, 2019, 8:36am

Speaking of Z9 as a variable name, that brings up a memory that I’d use variable names like N5 because it was easier to mark on a punched card: the mark for 5 lined up with one of the marks for N.

BBC Basic is still going, as a free product and for multiple platforms. It has an IDE or two and has lost the line numbers, IIRC. Personally I get lost in large programs but I’m happy to write short ones in Basic, or awk, or maybe even python.

I like Kay’s analogy though: some tasks are easier with the appropriate materials, and some choices of materials don’t scale well. Which isn’t to say something can’t be done.

drogon · May 7, 2019, 9:28am

Slightly drifting, but on the subject of variable names… I did COBOL at uni and I just couldn’t get my head round it - weird as I was already using BASIC, Pascal, FORTRAN and C… Turned out it was the coding style and choice of variable names used by the lecturer who was teaching us. When I made the effort to read the book it became a lot clearer - of-course I was told I’d never get a job for a certain institution because my style was too radical, but frankly at that point I didn’t care and have never written a COBOL program since…

-Gordon

auj · May 7, 2019, 10:28am

Thanks for the link. I think I might be having a massive nostalgia trip later…

whartung · May 7, 2019, 4:47pm

MS-BASIC for disk based systems is very close to BASIC PLUS on the PDP-11. The biggest difference is BP on the PDP had a nice facility for Virtual Arrays. But outside of that, the Disk I/O routines were very similar. RSET, LSET, mapping binaries in to strings to store in to records, etc. PRINT USING, etc. BP had statement modifiers like “A = A + 1 IF B(I) > 10 FOR I = 1 TO 10”, but that was it. Later editions of BP were better, but a lot of work was done on PDPs using BASIC.

Outside of a MAP statement to map memory to disk, an ISAM sub system (leveraging the maps), and optional line numbers, the BASIC I used was not dramatically different. No dynamic memory, nothing but scalar arrays, simple IF THEN ELSE, FOR-NEXT loops, GOTO and GOSUB.

Want to sort data quickly? You write an ITERATIVE QSort (you can actually do limited recursion in BASIC). For small sets, bubble sort to the rescue.

Want multiple fields for a “record”? and a “list”? DIM REC_NO(100):DIM REC_NAME$(100): DIM REC_ADDRESS$(100).

Want to sort by NO, NAME, or ADDRESS? You put selectors in the core comparator of that hand crafted QSort that you wrote (vs writing it 3 times).

Linked lists, binary trees, yada yada yada. You can do all of that in “extremely cut down” BASIC. Later versions of MS-BASIC let you REDIM arrays, so now you have dynamic memory heaps.

Writing a performant (for assorted values of “performant”) MERGE SORT for disk based rows is straightforward.

And…that’s data processing. Storing records, filtering records, sorting records, printing them out.

Once you do it a couple times (entry screens, a few reports, sort some files), you build up momentum and things get done more quickly. Write one report, you’ve written them all. Abstraction is not BASICs strong point, but cut and paste and conventions go a long way.

No, it’s not elegant. It’s capable. And when it’s the hammer you have, you solve problems with it. There’s a difference between “can’t do that in BASIC” and “don’t want to”.