Playing with Fuzix on the Pico

mc4004 · November 14, 2022, 5:02pm

The original Bell Labs UNIX v1 was written for a PDP-11 (so 64k address space + bank switching, which I’m not sure UNIX used), and had a K&R C compiler, enough to compile itself. This compiler was bootstrapped from an earlier language (“B” if I recall), so it may not be trivial to port, but the point is, this provides an existence proof that you can fit a minimal C compiler in 64k of RAM. I wonder how big Stallman’s earliest versions of GCC were. I’m pretty sure he originally developed it on a VAX (because that’s what we had at 545 Tech Square), using a “self cleanroom” port of PCC. So that begs the question: how big was PCC? You have to remember how truly small C was in its original form. And optimizations? “Forget about it.”

oldben · November 14, 2022, 5:50pm

PCC is still around, but it has been upgraded so you can’t tell the size anymore.
The 6809 os/9 C compiler ran on OS/9 level 1 and was with just 64k total memory.
OS/9 level 2 had a MMU, but still only a 16 address space. Intel got lucky with the x86
as segments and/or bank selection never really worked with other architectures.

From what I read about GCC it was based off a pascal compiler that read all the source
in memory at once. That I would say is cheating, as C was not developed with the 256K
DRAM chips and the 386+ cpu’s.

Ben.

EdS · November 14, 2022, 8:05pm

Just for background, I see

The Portable C Compiler (pcc) came from Bell Labs in the 1970s…

Looks like one early version of PCC was 8000 lines of code, spread over two passes. A multi-pass compiler is a good tactic for small-memory machines, and perhaps a good tactic for other reasons too.

As a crude measure of the degree of portability actually achieved, the Interdata 8/32 C compiler has roughly 600 machine dependent lines of source out of 4600 in Pass 1, and 1000 out of 3400 in Pass 2.

Oh, and now I notice

Although the compiler is divided into two passes, this represents historical accident more than deep necessity. In fact, the compiler can optionally be loaded so that both passes operate in the same program.

mc4004 · November 14, 2022, 9:13pm

I was in Stallman’s “orbit” at MIT while he was writing GCC. I heard nothing about a Pascal compiler, only PCC (circa 1985-86).

mmiler · November 16, 2022, 7:58am

I know a C compiler could fit in 64K of memory, because there were C compilers for 8-bit computers, back in the day, which worked without bank switching, and the computers had a main memory limit of 64K. I worked with one on my Atari 8-bit. It was a bit tight getting all the files to fit on one floppy disk. It didn’t leave much for saving source and object code on the same disk.

drogon · November 16, 2022, 9:03am

In the 8-bit C world, I may have mentioned this before, but it still seems relevant:

http://aztecmuseum.ca/

-Gordon

EdS · November 16, 2022, 9:17am

Interesting - I see the MANX compiler for Apple II offers overlays - that’s another way to get more out of 64k.

mmiler · November 17, 2022, 8:19am

The rather fascinating thing for me was to find out that the typical 8-bit compiler in fact compiled to a kind of p-code. That’s what the documentation called it. I found out several years ago that the reason for this was that most of the 8-bit compilers were based on Tiny C, which was originally written for an Intel processor. I forget which now, perhaps the 8088. The way they were implemented on platforms like the Atari, Commodore, and Apple computers was rather than compiling for the 6502, they compiled to Intel machine code, and the runtime that ran your program was really an Intel VM for the 6502, which emulated a 16-bit stack. This was one reason compiled C on the 8-bit was faster than Basic, but noticeably slower than assembly language.

The CC65 cross-compiler compiled to 6502 machine code. The way memory was organized with it, as I remember, was to have the stack go from high to low memory, and the heap go from low to high memory. I remember at the time (early 1990s) seeing that there was no check in the runtime to see if the heap and the stack collided with each other… You just had to be careful, keeping track of how much free memory you had.

HansOtten · November 17, 2022, 9:47am

The reasons 8 bit Pascal compilers use a compiler-interpreter approach is

the idea and initial start of most compiler is the Portable ETH Wirth Px compiler (P2 and P4) or Pascal-S
a compiler-interpreter makes porting and cross compiling easy, only the interpreter is hardware (CPU and OS) dependent
memory is a severe restraint on 8 bit architectures, the ‘VM’ or p-code interpreter is much more efficient in memory usage
a high level language such as Pascal (and C) require a stack based machine, with stack frames. The 8 bitters are not designed for that, the interpreter is (the 6502 is worst in that aspect)

The effects are easy to predict: larger programs are possible (such as the compiler itself) and slower execution. A good example is UCSD Pascal on an Apple as Apple Pascal.

The VM implemented in the interpreter is not an Intel emulator in the compilers I know. An 8080 is also not designed as a stack machine, so that would be even more inefficient. You have an example of a 6502 compiler generating Intel VM code?

oldben · November 17, 2022, 1:07pm

Virtual machines, made sense back then since a virtual machine order code could be rather
compact, compared to native code.
OS/9 had BASIC and PASCAL as virtual machines, but the 6809 cpu could compile C with no problem.
The PDP 8 BASIC and Algol 68 was a VM.
USCD in thier great wisdom changed the opcode format around version 3? killing the
the only hardware version of PASCAL, the PASCAL MICRO ENGINE.
A few 2901 bit slice PASCAL cpu’s where designed.

PASCAL being based off of ALGOL, kept the messy DISPLAY data structure making
slow even with hardware support. C at that time, just had static and local stack variables
making it faster.

Let’s not forget the classic VM , Threaded code, Some versions of FORTH and FORTRAN (PDP11).
A slow APPLE computer , was better than a experimental FAST computer time shared
from someplace far way.

EdS · November 17, 2022, 2:10pm

It does seem more likely that the p-code offers useful things like 16 bit values and a 16 bit stack pointer, as well as perhaps some rather high level operations, because why not. (The BCPL machine also offers some high level operations.) But the virtual machine might look more like an Intel microprocessor than a Motorola one, from a distance.

It’s encouraging if the 6809 has a natively running native code C compiler, running in 64k. It’s perhaps an indication of the expressiveness and density of 6809 machine code. (The 6502 is a long way behind, for all that it’s my favourite.)

mmiler · November 17, 2022, 10:30pm

The term “p-code” just means “portable code.” I wondered as well whether these C compilers used p-code from Pascal, but my research told me it wasn’t. The reason it was called “p-code” seems to be the fact that so many 8-bit compilers were based off of Tiny C, all producing the same machine code (Intel), which you could build a VM against. So, theoretically, one 8-bit C compiler could produce code that would run on another Tiny-C-based VM, just as the UCSD P-machine scheme was theoretically able to do, with a different instruction set. (I say theoretically, because once you got into the realm of using machine-specific features, such as producing graphics, then that portion of the Pascal p-code became machine-specific).

So, this was not to say that the VM for these compilers is stack-based, just that it emulated a 16-bit stack, which IIRC the 8088 had.

Once I looked at the 6502’s architecture, I could see the reason the implementors would go this route. The 6502 has an 8-bit stack (so, 256 bytes), which is pretty limited for C’s architecture. So, don’t bother compiling for it. Just use the Intel machine model, with its 16-bit stack. Though, I had the thought, why not take the approach CC65 did, by compiling for the 6502, but implementing the stack in software, rather than thinking it was a choice between having to use the 6502’s stack or Intel’s? It seems to me that would be the easier approach, rather than implementing a VM.

I guess it’s possible the implementors had the idea that since C is seen as a cross-platform language, that the 8-bit implementation should be that way, as well, similar to what Wirth and the implementors at UCSD were shooting for with Pascal, taking some of what would be machine-specific features (such as outputting text to the screen, and taking keyboard input) and generalizing them in the VM. This way, you don’t have to create a different back-end for the compiler. You just have to make a machine-specific version of the VM, to handle those features.

HansOtten · November 18, 2022, 12:51pm

The 6809 and especially the 68000 are fine targets for high level languages. Where the 6809 was still limited to a 64K address space, multiple stacks are available with less code overhead. I do prefer the 6809 and 68000 over Intel for a much more orthogonal instruction set. You can see much of the PDP-11 in there, the CPU where much of the research on compilers was done.

larsbrinkhoff · November 24, 2022, 9:58am

I’m not in orbit, but I have heard about Stallman attempting to base GCC on a Pascal compiler.

I found this: The Short History of GCC development

Hoping to avoid the need to write the whole compiler myself, I obtained the source code for the Pastel compiler, which was a multi-platform compiler developed at Lawrence Livermore Lab. It supported, and was written in, an extended version of Pascal, designed to be a system-programming language. I added a C front end, and began porting it to the Motorola 68000 computer. But I had to give that up when I discovered that the compiler needed many megabytes of stack space, and the available 68000 Unix system would only allow 64k.

I then realized that the Pastel compiler functioned by parsing the entire input file into a syntax tree, converting the whole syntax tree into a chain of “instructions”, and then generating the whole output file, without ever freeing any storage. At this point, I concluded I would have to write a new compiler from scratch. That new compiler is now known as GCC; none of the Pastel compiler is used in it, but I managed to adapt and use the C front end that I had written.

larsbrinkhoff · November 24, 2022, 10:04am

The even earlier Unix “version 0” was written for a PDP-7.

V1 didn’t have a C compiler, much less K&R. Yes, it is believed that the first C compiler by Ritche was based on Thompson’s B compiler. But I think PCC was a separate code base.

The first compiler was “pre K&R”, i.e. the syntax was even more archaic that true K&R as per the book.

Here is the first C compiler ported to run on Linux: GitHub - vegesm/first-cc-gcc: The first C compiler made to work under modern GCC

EdS · November 24, 2022, 1:51pm

I was looking into early C questions in response to comments on this thread, accumulated a bunch of tabs, but never shared anything. Here are some findings:

Warren Toomey’s 2009 Usenix paper The Restoration of Early UNIX Artifacts (6 page PDF)

Collection of 1972 documents “Preliminary Unix Implementation Document” (Bell Labs) (281 page PDF) (See page 97 for the start of the text)

1974 C reference manual (Bell Labs, 33 page PDF)

C is a new computer language designed for both non-numerical and numerical applications.The fundamental types of objects with which it deals are characters, integers, and single- and double-precision numbers, but the language also provides multidimensional arrays, structures containing data of mixed type, and pointers to data of all types.

C is based on an earlier language B, from which it differs mainly in the introduction of the notions of types and of structures. This paper is a reference manual for the original implementation of C on the Digital Equipment Corporation PDP-11/45 under the UNIX time-sharing system. The language is also available on the HIS 6000 and IBM S/370

PDP11/20 Handbook 1972 (234 page PDF) (I think I was hoping to get clues about memory size, virtual memory, paging, but why the 11/20? I don’t know.)

Similarly

mc4004 · November 24, 2022, 3:15pm

Lars, I have since read a bit more about GCC history. From memory… Stallman’s first attempt was to adapt a free Pascal compiler from Berkeley (LLNL), but it was too inefficient. I believe the second attempt was based on PCC. By this time RMS had established a long tradition of peeking at Symbolics Lisp machine code and rewriting it for MIT (and LMI and TI) to contain no proprietary code. You had to call this “pre clean room” (e.g. like the BIOS clones) because Stallman’s mind was on both sides of the “wall.”

larsbrinkhoff · November 24, 2022, 7:27pm

Here’s another B compiler. It it’s written in B, by modifying the old C compiler.

jhi · December 18, 2022, 3:47pm

I am only like 50% certain but I think the Sharp MZ series had a C compiler in the 1980s. I am 100% certain there was a Pascal compiler, possibly by HiSoft in UK.

Here’s their old archived page mentioning products and some of the microcomputers: https://web.archive.org/web/20111029231706/http://www.hisoft.co.uk/about/about.html
[added: found a fresher page to the same info: HiSOFT | About Us | Website Design | Programming]

And here’s decompiled HiSoft C for ZX Spectrum: GitHub - oldcompcz/HiSoft-C: Disassembled code of HiSoft C compiler for ZX Spectrum

EtchedPixels · January 1, 2023, 1:19am

There were a bunch of 8bit compilers for 8080/Z80 (BDS, Whitesmiths, etc).

For Pascal there were also several and they mostly followed pascal-s which was a reference small pascal. Pascal-S – Pascal for small machines and the default behaviour is to generate and execute a p-code so if you’ve got a cross compiler or write a small p-code interpreter it’s trivial to bootstrap on any platform.