Translation of 8080 Code to 8086

crandylb · June 26, 2020, 9:04pm

Microsoft Translation of 8080 Code to 8086 and Other 16-Bit Processors

I originally started this post as a reply to “Microsoft Open-Sources GW-BASIC”
but then thought it deserved a separate post.

I was at Microsoft from 1979 to 1982 as approximately the nineteenth employee.
At that time Microsoft’s portfolio consisted of the Basic interpreter, Fortran
compiler with assembler and linker, and Cobol, all written in Intel 8080
assembler. I worked primarily with the Fortran product known as Fortran-80,
and its components, ASM-80, and Link-80.

I got the job of porting Fortran-80 to 8086 code because I already had more
than ten years of experience with several 16-bit minicomputers including
systems programming in assembler and scientific programming in Fortran. I hope
everyone interested knows that the 8088 is identical to the 8086 from the
programmer’s point of view. The only relevant difference is the 8088 is
physically packaged to interface with an 8-bit memory bus while the 8086 uses
a 16-bit bus.

In simplistic terms the 8086 instruction set architecture can be considered a
superset of the 8080 instructions. The correspondence is close but not
exact. Intel provided a conversion table of equivalent 8086 code for each 8080
instruction. Most instructions convert one to one in a straightforward way but
there is one unfortunate difference. Some 8080 instructions did not alter the
carry bit in the status register while the “almost” equivalent 8086
instruction did alter the carry bit. This meant that an ugly blob of extra
8086 instructions had to be inserted to preserve and restore the exact state
of the carry bit in case a sequence of 8080 code later depended on it. Or put
another way, if a straightforward one to one conversion was used that ignored
the delayed use of the carry bit, and the 8080 code later depended on that
bit, the translation would fail to work properly. Fortunately, 8080 code
sequences that required preserving the carry bit were rare. So the
straightforward translation worked most of the time, and suspicious sequences
could easily be flagged with warning comments for later examination and
rewrite if needed.

I used the Stage2 general purpose macro processor to implement my translation
process. Not a lot of people today know what Stage2, or a general purpose
macro processor, even is. Briefly, Stage2 was created by Prof William Waite at
the University of Colorado in the late sixties as a major component of his
mobile programming system, MPS. Stage2 uses a pattern matching algorithm to
match input lines of text against a set of templates. Each template is the
first line of a macro and when a match is recognized the code body of that
macro is processed to produce output text, error messages, or create a
constructed line that is submitted for further template matching. So the
process is fully recursive and quite powerful in its capabilities for text
transformation. In fact, it can be used to implement a programming language
compiler. See: crandylb.github.io and my repositories at crandylb (Charles Randyl Britten) · GitHub.

The first major task porting Microsoft Fortran to 8086 was to modify
Fortran-80 to generate 8086 code producing a cross compiler, XFortran-86. Of
course it is still written in 8080 assembler and runs under an 8080 system
such as CP/M. Along with that the ASM-80 assembler was modified to assemble
8086 instructions producing a cross assembler, XASM-86. Object code produced
by these two tools could still be linked by Link-80 to produce an executable
file that could be run on an 8086, or 8088 processor.

The code generated by the 8086 compiler used the simple one to one equivalent
8086 instructions and had no concern for the carry bit anomaly. If I recall
correctly it was not possible for the compiler to generate code that ever
depended on delayed use of the carry bit status. So the generated code was
“clean,” relatively efficient and never required the extra blobs of code from
the Intel conversion table.

To convert the Fortran compiler to run on the 8086 architecture I used a
semiautomated hybrid approach. I wrote Stage2 macros that translated 8080
instructions one to one to the simple equivalent 8086 instructions. The carry
bit anomaly was not completely ignored since it was easy to insert a source
code flag in the comments to mark the places where the Intel code blobs would
have been inserted. As it turned out the one to one translations worked
fine in most cases, but in a small handful of cases the 8080 code made
delayed use of the carry bit and so the translated 8086 code would fail under
execution at those points. But, as I said, flags inserted in the translated
source made it relatively easy to find the points of failure and fix the
problems during testing and debugging sessions.

The source code for the Fortran-80 compiler was a single monolithic file of
thousands of lines of 8080 instructions. It made a printed listing about two
inches thick. Microsoft at that time used a DEC PDP-10 for time sharing and
batch processing. Because of the large file size and the speed of computers of
that era assembling and building the translated compiler had to be scheduled
overnight to avoid slowing down interactive work during the day.

Stage2 is written in an abstract language that is intended to be easy to
implement by various means. One method, provided as an example with the MPS
package, is to translate Stage2 source to Fortran using a simple program
written in Fortran, called SIMCMP. So on the PDP-10 I was using a Fortran
implementation of Stage2. I have implemented Stage2 on many different systems
in my career, almost always as a full bootstrap which, as a second stage
(ahem), uses Stage2 to translate the Stage2 source to the assembly language of
the target host system, using a variety of optimizations made possible by the
Stage2 macro processor. Such a full bootstrap implementation typically is
significantly faster, more efficient, and makes better use of resources on the
host system. Unfortunately time constraints did not permit a full bootstrap
implementation on the PDP-10. So the inefficient Fortran implementation of
Stage2 added to the overnight load on the PDP-10.

Using Stage2 to translate 8080 source to 8086 in this manner saved
considerable tedious labor, but the testing and debugging cycle and the
repeated overnight translation runs added to the time it took to achieve a
clean, working translation. The XFortran-86 compiler was translated in this
manner to result in a Fortran-86 compiler to run on 8086 family processors.
The XASM-86 assembler, the Link-80 linker, and the Fortran library were also
translated in the same manner. And it all worked astonishingly well.

However, the Microsoft linker, Link-86 in its 8086 version, turned out to be
an inconvenient bottleneck. Both the 8080 and the 8086 use 16-bit addressing
in programmer space. So the 8080 is limited to 64k as the maximum size of an
executable module including both code and data. Using these software tools on
the 8086 meant they inherited the same limitation. But the 8086 design
provides access to a megabyte, 1M, of addressable space by means of segment
registers. Segment registers are also 16-bit, but a segment address is shifted
left four bits in hardware and added to a 16-bit program address to form a
20-bit physical address to access a full 1M of memory, if available.
Programmers were free to structure segments in a variety of ways, each segment
limited to a maximum of 64k. But the Microsoft linker had no facility to
handle multiple segments.

So the entire 8086 package consisting of Fortran compiler, assembler, linker
and library, made a disappointing product that could not make full use of 8086
capabilities. At a meeting, Marc McDonald, who was responsible for maintenance
of the Microsoft linker, announced that it would be impossible to add multiple
segment capability to the linker. Not to be deterred, I devised a hack that
could be used to stitch together multiple segments by means of indirect jump
tables the programmer had to construct, and documented how to do it. But as I
said, it was only a hack and only meant to be a proof of concept, not
seriously a marketable product.

I also used Stage2 to translate 8080 code to the Zilog 16-bit Z8000 and, I
think, some other 16-bit processors, but I no longer remember the details. As
far as I know other Microsoft products such as Basic and Cobol were translated
by hand, if at all. It is also worth noting that Digital Research made an 8080
to 8086 translation tool called XLT86, but it appeared later and was not used
at Microsoft. I have not been able to locate a copy of Intel’s conversion
table. However DR’s XLT86 User’s Guide can be found online at
s100computers.com. It contains an elaborately detailed conversion table that
makes use of many special features and options of DR’s assembler products.

elb · June 27, 2020, 12:53am

This is a fantastic history, thank you!

whartung · June 27, 2020, 4:09am

Yea, that’s a great story.

On the one hand, it’s not a disappointing product in that it let folks port their Fortran-80 readily and quickly to the PC, even if it could not fully take advantage of the PC.

Even with the 64K linker, don’t you think you could have created at least a “small” memory model version quickly? (i.e. 64K code, 64K data, 64K stack), vs the “tiny” model?

Was the Fortran ever rewritten to better support the x86? I don’t Microsoft Pascal was popular in the mid- late-80’s, but I honestly don’t know how well the other language products were doing as MS was shifting over towards applications.

Do you think they would have just kept working on the compiler you had, or would they have started from scratch?

EdS · June 27, 2020, 5:19am

Most excellent, thanks @crandylb!

I do like the spirit of making things work, even when the world you’re working in seems clumsy in some way. (Often enough it’s a clever hack at the time but seems clumsy with the benefit of hindsight. At the time, we build up the necessary arcane knowledge and we get things done.) And the whole bootstrapping story is always somewhat magical to me, however many times I come across it.

crandylb · June 27, 2020, 5:47am

You are welcome. I spent a lot of time on it trying to be even handed and get the details right, but there are still mistakes. For example, ASM-80 was actually known as MASM, a name I thought was ill chosen. Always struck me with acute miasma.

crandylb · June 27, 2020, 6:06am

As I said, or at least implied, the 64k model, which is essentially identical to Intel tiny model, worked fine. After the IBM PC came out Microsoft bought a license to Silicon Valley Software Fortran which, I think, was mostly written in Pascal, and far as I know still provides that product to this day.

dbanay · June 27, 2020, 5:46pm

Thanks for posting this, it was a fascinating read! Have you seen Schorre’s META II system? Not the same, but I was reminded of it reading your description of bootstrapping Stage2 to a faster version.

whartung · June 28, 2020, 12:00am

Why did you not break the file up in to pieces that could be linked so that you didn’t have to rebuild the entire thing all the time?

crandylb · June 28, 2020, 12:55am

Well, let’s just say that might be obvious in light of better modular structure, but the large monolithic files were the way things were done before I got there and breaking it up might have bruised some fragile egos. Version control tools, anyone?

crandylb · June 29, 2020, 5:50am

Thanks for pointing me to META II. I had not looked at that before now. Looks like it and Stage2 have a lot of similar capabilities but go about it in different ways. I thought the name Schorre looked familiar. Prof Waite wrote a paper about WISP, sort of a predecessor to Stage2 with a H. Schorr as coauthor, but it’s a different guy.

Carlos_Camacho · October 18, 2022, 1:52am

Great history lesson. Thank you for your post. I’m a fan of the 6809. Do you or any colleagues have any materials related to this great CPU? Microsoft open sourced various software but it all seems 8080/8086/8088 related. I’m wondering if the 6809 source still exists?

oldben · October 18, 2022, 3:22am

8080 has nothing to with the 6809 that I know of other than microsoft
had a BASIC for the 6502 .Microsoft Basic 6502 – Retro Computing
and the 680x. All the software it seems was written on a PDP 10, thus a native
6809 version is not around other than dis-assembled sources.

As reminder 6800 code did need to be re-asembled for the 6809.
I just happen to be working on version of meta ii. About 400 lines of ASM code for the
virtual machine and about 400 lines of ASM code for the program it self. Looks about
4Kb of memory. It is at the stage it prints ‘hello’ and crashes, so more debuging for tomorow.

EdS · October 18, 2022, 10:05am

Welcome Carlos! I’ve made a new thread which relates to Microsoft’s Basic for 6809…