Microsoft is 50 - Original Altair Basic source released

To celebrate 50 years of Microsoft, Bill Gates himself has released the original source code for Altair Basic.

It’s a PDF with no text pane, but I’m sure it’ll be translated into machine readable form soon…

See also here:

https://www.gatesnotes.com/meet-bill/source-code/reader/microsoft-original-source-code?WT.mc_id=20250331100000_Microsoft-50th_MED-MED

-Gordon

4 Likes

Thanks for the news. It would be fun to get this code running again. I see Gates released a scan of a line-printer listing instead of machine-readable source text. Fortunately, OCR stands a chance—at least in theory. I experimented with free trials of several commercial OCR tools, and they all did a terrible job.

I tried OCR and failed. Thought of another plan though.

I have built a quick and dirty website that shows a random page. You can then copy it into a text file and upload it. If enough people do it we should be able to get an accurate copy of each page…

Anybody want to try it ?
I’ll post a link if there is any interest.

Crowdsourcing might be a good idea - please be quite sure you have all the things lined up and don’t lose any contributions. Share everything, early and often! (I say this because we tried crowdsourcing polygon capture for retro chips and it was a mess.)

Just to note:

The printout is over 150 pages of Intel 8080 assembly language

This is a fun way to commemorate an anniversary, but it’s important to realise what actually came of his scolding letter to the hobbyists who had received tapes of this from their friends.

The homebrew community was so unimpressed by both his code and his attitude that they went and developed their own standard to help people write their own BASIC interpreters. They started a whole magazine to share the work done with these projects, and started talking about “Copyleft” as a rebellion against the proprietary attitude of this rich kid who threw his toys out of the pram over his rough port of about ⅓ of Dartmouth BASIC.

Don’t ever let Bill Gates try to “own” this portion of history through wealth and PR. Copyright wasn’t clearly applicable to software at the time, though he was likely within his legal rights to defend his claim to the program (although what of Dartmouth?) but he was merely the sand in computing history’s oyster, and the explosion of shared code that followed was the pearl.

3 Likes

Copyright did apply to source code at the time (exactly what has just been released) but not to binaries, like the copied paper tapes he was complaining about in the letter. So his threats to sue were a bluff.

That changed just a few months later (October 19, 1976), however, as the USA aligned its copyright law with most of the world. Copyright protection was extended to chip masks in 1980 (if I remember correctly).

2 Likes

I was quite delighted to see someone at VCF East selling freshly-punched tapes of Altair BASIC for $5. Take that, digital oligarchs!

As for OCRing it, that will be fun. Each character is only 10 or 11 px high. While most of the documents were captured on a 150 Mpx Phase One camera, the page images are under 3 Mpx.

There are somewhat interesting nuggets of metadata in the page image files* embedded in the PDF. F’rinstance:

======= IPTC data: =======
City          : 
Record vers.  : 4
Spec. Instr.  : Guth_GreenCode_0102.jpg
DateCreated   : 20240717
Byline        : This item is subject to copyrigh
OriginalTransmissionReference: Guth_GreenCode_0102
(C)Notice     : This item is subject to copyright. Contact Gates Archive for more information, info@gatesarchive.com

So it was scanned in July last year, and has deeply embedded rights information. Some of the pages have the same metadata formatting errors I’ve only seen written by Raspberry Pi cameras.

Some time has been spent making this pretty, and I feel that some thought has gone into making it pretty useless for OCR.


*: pdfimages -j is your friend for extracting the original embedded files, and I used jhead to list the metadata

3 Likes

Currently it is all very rough.

The pages of the pdf each contained 2 pages of fan fold. So in total there are 314 pages.

I have an html page that displays a random image, and a small form to upload a file. The idea is, you create a text file, copy the page displayed in the image to the file, then upload this giving it the name name p<pagenumber>.txt eg: p204.txt. The site will then take this text document and store it adding a time stamp so there can be more than one copy of each page.

That’s as far as I have gone. The next bit is more difficult. Turning the pages into an accurate document. Not sure how to do this bit yet. Any ideas appreciated.

Crowdsourcing “human OCR” might work, but we’d need to partition the work well to avoid duplication of effort. Doing everything twice is not a waste because it catches errors, but more than that is.
An approach that I like even better is to A) find the binaries, B) disassemble them, and C) use the symbol table at the end of the listing to make the disassembly more readable. (okay maybe “OCR” the symbol table first so we can feed it to the disassembler).
It’s the comments that are truly valuable here, and a scan works just fine for reading those.
BTW’ not only is there is a symbol table at the very end of the listing, but there is a nice cross-reference too.

P.S. Is anyone a handshake or two away from Bill Gates? We could volunteer to help him somehow, if he helps us digitize the listing he just published. I bet finding the binaries for Altair BASIC wouldn’t be hard, but it might be hard to find the exact version. I’m guessing that listing is probably one of the versions they wrote while still at Harvard.

1 Like

P.P.S. Looks like people are already working on a “good” disassembly of Altair BASIC: http://altairbasic.org
and
GitHub - option8/Altair-BASIC: Micro-Soft Altair BASIC 3.2 source

1 Like

Good luck — but I’m not volunteering to help one of the world’s richest men.

Those disassemblies are old projects: altairbasic is from 2013, the github one in 2019. They’re not based on the scans.

You might be referring to the Semiconductor Chip Protection Act (SCPA) of 1984, sometimes called the “Maskworks law” with its associated circle/M symbol. That is an interesting law crafted by industry players who wanted to make it okay to reverse-engineer each other’s chips. I don’t remember much else about the law. Zilog famously inserted a handful of “traps” (visually indistinguishable deceptions) into the Z80 design that set copy-cats back six months, the story goes.

Fair enough. Wealthy sure, and yes, that infamous open letter did not earn him goodwill (*), but Bill Gates is no Darth Vader. Since his retirement, he has championed a lot of noble philanthropic work and prize-worthy public-health efforts. I was merely suggesting is that he might talk to us.

FWIW, the 2019 GitHub project explicitly credits Reuben Harris’ 2019 altairbasic.org project, which I believe was a true reverse-engineering of the binaries, with good explanations and diagrams too. They are clearly related projects. Perhaps a new leader “took the torch” and finished the job.

This effort seems to be based on Version 3.2 of Altair BASIC, whereas the listing Bill Gates published recently is Version 3.0. Still there could be some synergy. (Of course the reverse engineers were guessing at all the label names.)

(*) Microsoft soon changed their business model to one where the computer manufacturers were the software licensees, not the end-users. Sound familiar? And yes, Palo Alto tinyBASIC was born, which was, in a way, a form of early “open source” software.

This is getting a little off-topic, but the Gates Foundation nixed waiving intellectual property rights on COVID-19 vaccines, putting public health under considerable financial strain:

He’s also no saint (NYT).

Agreed, personal virtues and villainy are off-topic, but almost relevant enough to mention for sake of historical context… Microsoft was notorious for strong-arming PC manufacturers into not delivering alternative operating systems on their hardware. (Later, Microsoft did some internal soul searching to polish their reputation.) But love him or hate him, we have to acknowledge that Bill Gates was a seminal figure in the personal computing revolution even before DOS. Me, I was a “Fortran guy” in the late 1970s, a “Pascal guy” in the early 1980s, and a “Lisp guy” thereafter, but I’ve taken a recent interest in the many dialects of BASIC built-in to early home computers, and I consider Microsoft BASIC in particular worthy of study for its language design (some borrowed from DEC) and implementation techniques. I’ve already benefited from the 4k BASIC reverse engineering efforts and the recently published source code with its extensive comments.

Indeed. It also stresses your moderators, so if anyone is tempted to just throw in a quip or an aside about some personality, or a group of people, or a historical event, or even a current event, please think twice. Is it about retrocomputing, vintage computing, or computing history? Will it be of interest to the readership? Will it be controversial or divisive? Might it lead to a good conversation in which we learn new things and reinforce our enthusiam for this forum and its topicality? Could it just as well have been a comment on social media to your friends and family?

2 Likes

And now back to our regularly scheduled program…

I was reading Reuben Harris’s comments about Micro-soft 4k BASIC (see altairbasic.org). I hadn’t realized it was designed for 4kB RAM total, and that the interpreter, editor, and math library takes 78% of memory. This got me thinking… Texas Instruments just released the MSPM0C1104, “the world’s smallest MCU” the size of a grain of sand (1.38 mm^2). It sports 16kB of Flash memory and 1kB of SRAM. Originally I thought “1k of RAM is not much to work with,” but it turns out this is almost exactly the constraints targeted by 4k BASIC. One approach to preserving the 8080 code is to “reverse compile it” into a high level language like C, keeping all the algorithms and data structures. (4x code expansion is not unreasonable; error strings and other constants can also live in Flash). The extensive comments in the recently published listing should be enough to get the job done. The original assembly language code will not garner nearly the same interest (e.g. among students and historians) as would a more readable version.

That really is quite remarkable!

One difference between the modern microcontroller type device and the old systems (in the mid 70’s) is that back then your program lived in RAM - today, in a microcontroller at least, your program lives in ROM (or Flash, etc.) and that “tiny” 1K RAM is just for data storage. More than enough to run a toaster, microwave, fridge, etc.

-Gordon