Building a BASIC Interpreter, '80s style

jecel · September 9, 2020, 9:22pm

Basic is not expandable and is small enough that you can assign a different key for each token. This not only had the advantage of helping get the syntax right (like today’s Scratch and other block languages) but also reduced the number of key presses required in the very bad keyboards these machines had (not only the ZX80 and similar but also the calculator style Basic computers from Sharp and Casio).

Forth, on the other hand, is extremely expandable so the source must be typed out. There are several implementation techniques: token threaded (used in OpenBoot), direct threaded, indirect threaded, call threaded. Most of these can be considered a tokeninzation of the source.

cjs · September 10, 2020, 1:34am

Well, it will depend of course on how you move to “more structured-like commands.” But let’s look at what happens when you change MS-BASIC to have a block structure, such as done with Visual Basic.

The usual process of transforming BASIC into a block-structured language makes three massive changes to the language. It does whole-program analysis (i.e., no longer is any arbitrary combination of lines a valid program, and your whole editing system needs to replaced with a completely different one), it adds general user-defined functions, and it adds scoped variables.

These produce utterly breaking changes to the language semantics; despite the surface syntax similarities, translating programs between BASIC and this new language can require essentially rewriting parts of the program from scratch (depending of course on what features of each language are being used). Consider, for example:

10 GOTO 60
40 NEXT I
50 END
60 FOR I=0 to 3
70 PRINT I
80 GOTO 40

For a block-structured language where a for/next block is a node in the AST, this is not a valid program, and an AST cannot be generated for it. Sure, this small example can be easily rewritten in the obvious block format, but you do have to rewrite it, not just make minor syntactic tweaks, and of course larger programs are not so trivial.

From the opposite direction, consider a program that makes extensive use of functions and local variables (as almost any program written in an Algol-style language does). You’ll find it’s much easier to translate such as large program into another Algol-style language than to translate it to MS-BASIC, which again will involve nearly a complete rewrite for a program of any complexity.

When your program in “BASIC” is trivially translated to Python and requires a substantial rewrite to translate it to MS-BASIC, I think it’s fair to say you essentially have a different language.

Yeah, those standards are a nice fantasy, but I don’t know of any ROM microcomputer BASICs that ever even thought about rejecting programs where a FOR does not have a corresponding NEXT (or vice versa) in the block structure described in the Minimal BASIC standard. Instead they all (as far as I’m aware) accept and run quite happily the above example of a completely invalid program by those standards.

You could use this as an argument that MS-BASIC and similar interpreters are a different language from BASIC, if you want, and I won’t disagree with that. But the point still stands; such major semantic differences essentially give you different languages.

drogon · September 10, 2020, 6:28am

It’s true that the standard came into being long after ROM BASIC was “a thing”, but there are some BASICs out there (now) that do adhere to the ANSI standard (both the full and minimal versions). I also think its difficult for an interpreted BASIC to fully count match FOR/NEXTs too. Constructs like NEXT : GOTO are valid, (and no different from BREAK in e.g. C), and virtually impossible to check for pre-RUN time. Using GOTO can thwart any sort of counting, but that’s true for the run-time of a C program too. Dare I mention longjmp() in C?

The early ROM BASICs, mid 1970’s were already 10 years after Dartmouth BASIC, but were constrained by cost (and time to market - read Woz’s tale of getting Apple Game (Integer) BASIC going - with the aim to have the first 6502 BASIC)

I accept that all BASICs are different but similar enough that you can get to grips with the local dialect relatively quickly. The ‘explosion’ of the micros in the late 70’s/early 80’s really fragmented (or diversified?) BASIC. you either paid Bill Gates or wrote your own - and writing your own was really the only way to make your particular little “home micro” with its quirks and fancy graphics and sound work economically and in the time scales expected back then.

And is my RTB Basic (RTB is Return To Basics) a “real” BASIC? No - it’s my ideal BASIC. Loops for example are all unified with the same construct which is

cycle
   ... do something
repeat

So:

for i = 1 to 10 cycle
  print i
repeat

and

while i < 10 cycle
  print i
repeat

or

until i >= 10 cycle
  print i
repeat

and so on.

And at one point, this was actually allowed:

for i = 1 to 10 cycle
  print i
repeat until i > 5

and it would do what you expected, however I decided to add a flag to deliberately stop that working as it wasn’t “basic” enough for me…

I did get some whinges and bleats about this though (and probably will do again), so at one point I did put in NEXT but when I mentioned this to the people interested at the time, no-one even bothered to download it and test it for me, so I took it out again.

“You can please some of the people all of the time, you can please all of the people some of the time, but you can’t please all of the people all of the time”.

― John Lydgate

(I stole cycle…repeat from Imp77 which was the first block structured language I learned - up to then, it had been BASIC (Apple, PET, TRS80, dial-up tty33 'Mainframe") → 6502 asm → FORTRAN → Imp77)

Cheers,

-Gordon

EdS · September 10, 2020, 7:30am

(I suspect one thing going on here is that we have two perspectives: the language user sees one set of facilities, and the language implementer knows about the machinery.)

cjs · September 10, 2020, 7:47am

That does not appear to be the case. Standard ECMA-55: Minimal BASIC, which says it is “fully compatible” with ANSI X3J2, is dated January 1978, only six months after the Apple II appeared, and work on this was started much earlier, before Gates/Allen or Woz started work on their very first tape versions of BASIC or even the appearance of the Altair 8800:

…in September 1974 the first meeting of the ECMA Committee TC 21, BASIC, toook place. In January 1974, a corresponding committee, X3J2, had been founded in the USA… The ANSI one was distributed for public comments in January 1976…

That ANSI distribution for public comments was less than a year after Gates and Allen had started on the first MS-BASIC, and around the time that Woz was making available preliminary versions of his Integer BASIC.

So the idea of NEXT etc. being an syntactic and semantic part of a lexical block, rather than a stand-alone statement, was at least contemporary with the early BASICs that implemented it as the latter. It could well be much earlier; I’ve not gone back to look at the early '70s minicomputer versions of BASIC, much less the original Dartmouth BASIC.

For convenience, let’s refer to the “block-structured” BASICs such as the ANSI/ECMA ones as “A-BASIC” (mnemonic: “Algol-like”) and the Gates/Allen/Woz/etc. non-block structured BASICs as “M-BASIC” (mnemonic: “Microcomputer” or “Microsoft”).

Not at all, since in a block-structured language the parser does this by nature.There’s no problem at all with an interpreter running against an AST generated by such a parser and, in fact, pretty much every contemporary and modern interpreter of an Algol-ish language (Bourne shell, TCL, Perl, etc.) does this.

The issue here is that the M-BASICs are line-structured, with every individual line being its own unit, bearing no relation to any other line. NEXT, as I alluded to above, is a stand-alone keyword with specific run-time behaviour that is very different from what I am calling, for lack of a better term, “block-structured” languages. Consider the following program:

10 FOR I=1 TO 3
20 PRINT "I";I
50 IF I=3 THEN FOR J=1 TO 3 : PRINT "J";J
70 NEXT

This is perfectly valid and does what you would expect in M-BASIC, but should produce a syntax error in A-BASIC, because in the latter, NEXT is not a stand-alone statement that says “look up the current stack for any executing (stand-alone) FOR,” but instead must be associated with a specific FOR in the program text (ECMA-55 §13, p.17, PDF p.22).

Yeah, I’m not so sure that’s true. If you have an M-BASIC background, and know of Algol-like languages, sure, it’s not a big deal to understand the difference above. But would someone with only A-BASIC experience really get to grips “quickly” with programs like the one above?

Actually, it appears to me that licensing Microsoft’s BASIC and extending it appropriately was usually considered a better answer to that than writing your own. Looking at the machines I have around, only my Apple II and Panasonic JR series have their own BASIC, and Apple later switched to an MS-BASIC anyway, which provided more capable Apple-specific graphics support. Radio Shack, Commodore, Hitachi, NEC, Sharp and Fujitsu all shipped their late-70s and early-80s microcomputers with an MS-BASIC; the Japanese ones with plenty of their own extensions for everything from hex literals to graphics to sound to kanji support.

If you simply replace cycle with do and repeat with end you have Ruby code. So, is Ruby essentially BASIC with some OO stuff added to it?

cjs · September 10, 2020, 7:49am

Well, some people may look at it that way, but what I am trying to explain (perhaps poorly) is that that is not correct. The implementation may have driven the separation of A-BASIC and M-BASIC languages, but while having similar surface syntax, the semantics of these two are substantially different, and the semantics of A-BASIC are closer to Python and the like than than to M-BASIC.

The names are misleading; what’s important is to look at the language itself, rather than what someone chose (whether due to confusion or marketing reasons or whatever) to call it.

EdS · September 10, 2020, 8:57am

Quite right: from your chosen perspective, the answer is clear. And I see that, and I understand it.

Edit: to defuse what might seem like a confrontational comment, perhaps if we take the taxonomist’s perspective we might refer to Fortran-like Basics and to Algol-like Basics.

And let’s not forget BBC Basic, Atari Basic, Locomotive Basic. There’s so much more than MS-Basic. Indeed I think GW-Basic might be a different thing again. There’s a bit of a list here - and that’s only for 6502.

jecel · September 10, 2020, 3:45pm

The initial BASIC for the TRS-80 was a port of Tiny BASIC with floating point added to it. It was named “level I” and the planned MS-BASIC would be “level II”. The $120 upgrade from level I to level II was a hardware kit to replace the original 4KB ROM with 12KB.

elb · September 11, 2020, 12:58am

If you leave the home micros, DEC BASIC and HP BASIC and …

Then there’s the later MS BASIC, QBasic, which indeed does away with line numbers, adds named subroutines, and all kinds of other features, but can still be written in a style very reminiscent of the 8-bit MBASIC (with or without those line numbers!) and is very much recognizable as the same language.

To me, BASIC is a quite diverse and long-lived language with a strong evolution until it fell upon hard times when it did, indeed (I think as @cjs is regretting?) stray far enough from its roots that it wasn’t really … BASIC-y enough. Certainly Visual Basic (and VBA) have hit that point. However, in between, it admitted a lot of styles that still look and feel like BASIC when I see them.

cjs · September 11, 2020, 2:41am

I’m certainly not regretting anything about BASIC except that it was invented in the first place; it’s a terrible, terrible language with no real excuse to exist. (Compared to other languages of the time it’s harder to learn, except on the most trivial level, harder to program in, and more resource-hungry.) But I’m providing that for background; if you think that BASIC is a great language, I’m not going to argue against that. (I’ve just been too beaten down by people with little programming experience who have a seemingly religous convinction that “BASIC” in every form is one of the best languages out there.)

While I still find the syntax of VBA clumsy, it appears to be a massive improvement over the older BASICs, so improved that it’s basically a different language. As I said, if you need to completely rewrite your several-hundred line program to port it from “BASIC x” to “BASIC y,” I don’t think you have the same language in x and y.

whartung · September 11, 2020, 3:50am

Well, I’m the local BASIC apologist.

I don’t know if you actually did any data processing back in the day. BASIC is a data processing language, ala COBOL and RPG, rather than perhaps a “computer science” language.

BASIC in its many forms was particularly productive with its interpreters compared to other languages. While the BASIC runtime may have been resource heavy, where it was resource light was in terms of actual developer time getting something accomplished. The Edit/Run cycle was fast and efficient compared to any of the compiled languages.

I was looking at the COBOL-80 compiler for CP/M. It was a 5 pass compiler. 5 passes! Oh my word. And even then, it was running an internal interpreted code, not machine language. So it had a large runtime footprint out of the box.

IBM mainframes ran COBOL for their systems. Many of the small mini computers ran BASIC. BASIC powered a large vertical system industry. I’d hate to have to count the number of dental and auto shop systems that were floating around on most any hardware you could name that were written by some hack with an itch in BASIC.

No doubt the code is wide open for criticism, however all of those systems had other options for languages but they weren’t chosen. Instead we had a lot of useful applications written over a short time in the hands of the folks who needed them. Which in the end is what really matters.

Simply, BASIC was specifically empowering for people and processes, more so than anything else on the market at the time. It’s notorious because of its ubiquity and broad application. All of the other “better” systems we don’t hear or talk about, because they didn’t have the success of BASIC.

My Dad was successful with BASIC. Worst. Code. Ever. So, so bad. But it did what he wanted it to do, and in the end that’s all that mattered. His major lament was that the system couldn’t distinguish between the variable AA and AAA (you can see how he chose his variables).

Who knows where the computer, especially the micro-computer, world would be if BASIC didn’t exist. It was a language of its time.

cjs · September 11, 2020, 5:32am

Well, I can’t let this one go by. This general type of argument is one I keep hearing, but I have never seen anybody say this who has either themselves made a serious comparison to or quoted studies comparing it to other major interpreters such as Lisp or Bourne shell. (Even you in this very post here use a straw man comparison against compiled Cobol, rather than other interpreters also claimed to have a fast compile/run cycle.)

As for the “look at all the systems that used it; it must have been better!” argument, I think it should be clear that that’s not a valid claim that the original implementors knew of and how to well use other systems and made a choice to use BASIC based on that.

(Note for the moderators: I think it would be a good idea to move this and the previous post I’m responding to to another thread. Both of have moved well beyond merely clarifying the biases of participants in this thread.)

EdS · September 11, 2020, 7:05am

A very helpful stance - thanks. Forums such as this work best when each of us brings our findings, our experience and our insights, and let them stand on their merits without trying to show that we hold a better version of the truth than anyone else.

Oh dear.

Everyone: Please feel free to start a new thread when you feel you need one: quote from existing posts, use the referenced thread mechanism if you can. As always, consider what the thread is about and try to give it a useful title.

EdS · September 11, 2020, 7:07am

EdS · September 14, 2020, 7:00am

This topic was automatically opened after 2 days.