RIP: Kathleen Booth, the inventor of assembly language

sohkamyung · October 29, 2022, 2:12pm

Obituary Professor Kathleen Booth, one of the last of the early British computing pioneers, has died. She was 100.
[…]
In 1946, Britten and Booth collaborated at Birkbeck on a very early digital computer, the Automatic Relay Calculator (ARC), and in doing so founded what is now Birkbeck’s Department of Computer Science and Information Systems.
[…]
As well as building the hardware for the first machines, she wrote all the software for the ARC2 and SEC machines, in the process inventing what she called “Contracted Notation” and would later be known as assembly language.

drogon · October 29, 2022, 6:27pm

Out of curiosity; does anyone know who first coined the term “assembly language” and maybe when?

So - “Contracted Notation” above and when I was at uni (1980+) I did a summer job (I was paid for it!) programming an (ancient…) Elliot 903 in SIR - Symbolic Input Routine, but the documentation talks about assembling the SIR into the machine language.

-Gordon

EdS · October 29, 2022, 7:15pm

So “assembling” was used as a verb? Google’s ngram viewer suggests assembly language comes in right at the beginning of the 60s, perhaps coined at the end of the 50s. (But dates in Google Books can be off.) This report from 1958 discusses LISA, possibly a Univac assembly language.

drogon · October 29, 2022, 7:20pm

Maybe as computers were hand assembled back then it made sense to assemble the instruction code into machine code?

So same concept/idea developed in different places at round about the same time, maybe?

-Gordon

NoLand · October 29, 2022, 10:00pm

There was SAP (Symbolic Assembly Program) for the IBM 709 (1957), written by Roy Nutt. And Stan Poley wrote SOAP or Symbolic Optimal Assembly Program for the IBM 650 in 1955. – So “assembly” had been an established term in the mid/late 1950s.

In the late 1950s, the idea of a symbolic assembler was apparently still a rather strange one. E.g., Ed Fredkin remembers from his time as an Air Force liaison at Lincoln Labs:

People argued that it was wrong to program in symbolic code. I tried to teach a course once there at Lincoln, where I was trying to explain to programmers why they should use symbolic code. “No“, they said, “We’re used to octal. We like it. It’s better. You’re closer to the machine.”

Wikipedia claims the term to be coined in 1951:

The term “assembler” is generally attributed to Wilkes, Wheeler and Gill in their 1951 book The Preparation of Programs for an Electronic Digital Computer, who, however, used the term to mean “a program that assembles another program consisting of several sections into a single program”.

So this would have been more what we know as a linker nowadays. Written in 1949 for the EDSAC, it used single-letter mnemonics for instructions. (I’m not sure, if this was just related to the “initial orders” in the first 31 registers, or more general. Sources vary and I haven’t read the book.)
This somewhat coincides with the practice of using mnemonic letters as the actual instruction codes, as seen on the UNIVAC I or the LGP-30 (and probably a number of other machines, as well.)

NoLand · October 29, 2022, 11:08pm

As for Wilkes et al, The Preparation of Programs for an Electronic Digital Computer, we find the terms “assemble” and “assembly”, but there’s no notion of an “assembler”.

“Assemble” means the generation of an order (instruction) in a register, ready to be transferred to a memory location where it will be executed in place. (This is specifically related to the Initial Orders program, reading input from tape for setup.)

The individual rows of holes specifying an order are read from the tape one by one, and the information they contain is assembled in the accumulator to form the complete order. This order is then transferred to the position it is destined to occupy in the store as part of the program.
(p. 29)

Similarly, the term is used once used in the context of paper tape I/O, where a single instruction word is represented as multiple codes on tape. (Which is not what the EDSAC did.)

In many other machines, by contrast, the basic input operation is to read a number of rows of holes, and to assemble the resulting digits side by side to form a complete word;
(p. 73)

“Assembly” makes use of assembled instructions, but has a broader meaning, subsumed under “automatic programming”. Here, assembly describes the task of automatically linking and fixing up library subroutines.

A program is composed of a master routine and a number of subroutines and, in the ordinary way, the programmer must decide where these are to go in the store and provide the necessary cross referencing between them; for example, he must insert the correct addresses in those orders in the master routine which call in the subroutines. Use of terminal code letters enables him to defer until a later stage the decision as to where the various subroutines shall go, but he still has, ultimately, to make the decision himself.
The assembly subroutine given below allows the assembly of the master routine and the subroutines to be performed automatically by the machine.
(p. 127/128)

So I’m not sure, if I would attribute the term “assembler” to Wilkes et al. While they do describe the core operation, there is no notion of the broader concept of an “assembler language” or a related program to process this.
Moreover “to assemble” is restricted to the context of the initial orders program (mentioned twice), while “assembly” (used 23 times) seems more related to a continuous unit produced from a collection of separate subunits, which are already available in binary form.

Based on this, I’d say that Wilkes et al probably gave the inspiration and by the mid 1950s it was already an established term with special programs dedicated to the task of translating symbolic source code to executable programs. It seems, it must have happened somewhen inbetween Wilkes and Poley. On the other hand, maybe, the keyword in Poley’s SOAP isn’t “assembly”, but “symbolic” and there happened a transfer of meaning from this. (IBM emphasized the word “symbolic” for assemblers for quite a while.)

jecel · October 31, 2022, 8:26pm

People love simple stories but actual history is pretty messy.

I have not investigated how “compile” has come to mean “translate” in computer science. The traditional meaning of the word is also closer to what we would use a linker for these days. You would say something like “this report compiles salary information from three different areas of our economy”.

oldben · November 1, 2022, 12:44am

I would say to compile is to link. I am too ,not sure how compiler came to mean translate.
Macros and macro assemblers seemed to be the rage back then. Did people think
that simple string movement could give the illusion of a high level language.
“f=a+b-c” ld a;add b;sub c;st f;
Ben

NoLand · November 1, 2022, 2:30am

I’d say, a symbolic assembler is in its most basic form adding up symbols and literals given on a line to form an instruction code (or to represent data). Then, macros and basic operations (like multiplying a value to form another entity on this line) became available. However, this is just a replacement logic and the basic operation is still adding up symbolic and literal values. (Some of these values may be auto-generated, as with addresses of assembler variables and allocated addresses for embedded constants, but the basic principle is still the same.) Even linking an external binary is in principle a replacement.

(This is pretty clear for machines with single-word instruction formats, and it remains the same principle with variable length instruction sets, like with MPUs. The byte order may be funny, but it’s still about binary values replacing symbolic representations.)

Compilers, on the other hand, operate on syntax (as opposed to bare symbols). Thus you get things like the ‘magic’ of operator precedence by defining what is a term and what is a factor, and so on. The goal is here to generate a sequence of operations that replicates what is represented in that syntax. Notably, here, the integration of an external (binary) function requires some kind of interface.

(So “a+b+c” means in the context of an assembler, “replace this expessions by the sum of the values represented by a, b, and c”, while it would mean in the context of a compiler, “generate a sequence of instructions that, if run, produces the sum of the respective values, as currently in memory, in the accumulator.”)

The words, we’re using for them, are quite arbitrary. Why isn’t any of them a “collector”, or a “replicator”?

oldben · November 1, 2022, 6:48pm

But syntax does not cover it all.BNF almost works. Playing with meta ii, here, and wanting to add more, i noticed a lack of formal notation. At some point you need to consult parameters of said symbol. width of (s) might return the size of a symbol, not needed in parsing, but dam important to generate code. size of (s) might return the number of elements, 0 for a scaler, n for vector, -1 for a dynamic vector.

NoLand · November 1, 2022, 7:59pm

I totally agree, syntax is just half the rent, you need semantics, too – and this is generally a separate affair. You may introduce some hints, or smuggle in some loose meaning (say in EBNF, you may have either something more general like <identifier>"."<method-name>"("<parameter-list>")"or be more specific with things like <dimensional-object-identifier>".set("<width-scalar>")"), but this is still no rigid definition. Return values are even harder. Defining how these are to be generated even harder so. I haven’t seen a satisfying all-purpose approach combining both, yet.

oldben · November 1, 2022, 9:30pm

I guess I need to read more on the ATLAS Compiler Compiler, and Atlas Auto code.
They did alot back then, when men where men…