Modern PDP-11 C Compilers

Hello all,

Ed suggested I start a thread about my adventures over the last week trying to find C compiler that would produce PDP-11 machine code, for running on one of the BBC Micro’s PDP-11 Co Processors, of which there are two:

These are the Pi/ARM-based PiTubeDirect Co Pro and the FPGA-based Matchbox Co Pro.

In addition, the B-em emulator also contains a PDP-11 Co Pro emulation, so maybe that counts as three! This is actually a great development environment, as it saves having to mess around with BBC Micro SD Cards.

As an example, I set out to try to compile a version of the famous Pi Spigot.

This short C program prints the first 1000 digits of Pi:

#include <stdio.h>
#define N 3500
main() {
   short r[N + 1], i, k, b, c;
   long d;
   c = 0;
   for (i = 1; i <= N; i++)
      r[i] = 2000;
   for (k = N; k > 0; k -= 14) {
      d = 0;
      i = k;
      for(;;) {
         d += r[i]*10000L;
         b = i*2 - 1;
         r[i] = d%b;
         d /= b;
         i--;
         if (i == 0) break;
         d *= i;
      }
      printf("%.4d", (int)(c + d/10000));
      c = d%10000;
   }
}

For it to work, it needs signed 32-bit arithmetic (longs in C on the PDP-11) to operate properly.

Here are links to the three chapters of the adventure:

The first two appoaches lead to dead-ends for various reasons (detailed if you follow the links).

The third approach (using GCC) proved surprisingly chanllening, as I rapidly hit bugs in the GCC PDP-11 target backend. Based on my experiences, I think it’s safe to say this probably isn’t getting much use these days.

Anyway, I was ultimately successful in my goal:

This has taken about a week, mostly trying to understand enought about how a GCC backend works to be able to fix the two bugs.

It would be nice to try to run some kind of a C test suite, but I suspect this will not be possible in an environment with only 62KB of RAM and no access to the standard libc.

Is anyone aware of any other compilers I could try?

Dave

4 Likes

Dec up dated the PDP 11 to the VAX in the late 70’s. You might find a 32 bit compler on that platform. Ben,

This is a bit of a stretch of an idea, but I note that the recent effort to build a 6502 backend for LLVM seems to be making good progress. LLVM is another massive codebase, but it is a much more modern one, and this is a worked example, so maybe useful.
6502.org • View topic - LLVM 6502 Codegen

Is it also possible that the PCC route isn’t too much of a dead end? I would hope that a compiler doesn’t demand too much of the OS, so the BSD vs Linux thing might be more minor than major…

The issue with PCC was that it is just a compiler. It calls out to external programs for assembling and linking. I was trying to assemble the output with the latest GNU as from binutils 2.37 and getting lots of syntax errors.

However, the documation does suggest GNU as support both DEC and BSD syntax, for the PDP-11, so maybe there is just a flag I need to change.

I’ll have a play with this again later.

Regarding LLVM, I’ve also read that it’s easier to write backend for than GCC. If I were going to do that, it might be worth doing it for the OPC-7 instruction set.

1 Like

So the specific case the GNU assembler is failing to handle is extended branch instructions. Here’s a description of those from Dennis Ritchie’s UNIX Assembler Reference Manual:

i.e. they are effectively synthetic instructions which are not currently handled by the GNU assembler.

So I did a quick SED hack to replace these by short branch instructions.

I then found GNU assembler falling to deal with embedded data, for example:

.data
.even
_pl:
~~pl:
35632 ; 145000
2765 ; 160400
230 ; 113200
17 ; 41100
1 ; 103240
0 ; 23420
0 ; 1750
0 ; 144
0 ; 12
0 ; 1
0 ; 0

And finally, it looks like he default base for constants is different.

For example, the start of the .s file produced by PCC includes:

_program:
~~program:
jsr r5,csv
sub $20,sp

where $20 here is an immediate octal constant (if it were decimal it would terminated by a decimal point ‘‘.’’)

If I assemble this .s file (using GNU assembler), and disassemble the result (using GNU obj-dump), what I see is:

0000010c <_program>:
 10c:   0977 0290       jsr r5, 3a0 <csv>
 110:   e5c6 0014       sub $24, sp

The value has now become 0x14, or 20 decimal or 24 octal.

According to the manual, GNU assembler is assuming constants are in decimal, unless they start with a ‘0’ digit:
https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_node/as_36.html

This is different to the old BSD Unix Assembler (see the earlier link). It’s not just immediate operands; it affects all constants in the file. So it affects accessing objects in the stack frame.

For example, 4 successive words in the stack frame (-12 -14 -16 -20):

mov     -16(r5),-(sp)
mov     -20(r5),-(sp)
mov     -12(r5),-(sp)
mov     -14(r5),-(sp)

become:

 12e:   1d66 fff0       mov -20(r5), -(sp)
 132:   1d66 ffec       mov -24(r5), -(sp)
 136:   1d66 fff4       mov -14(r5), -(sp)
 13a:   1d66 fff2       mov -16(r5), -(sp)

Not good!

I gave up in dispair at this point and switched over to GCC.

Maybe I should try updating PCC to prefix octal values with a ‘0’

3 Likes

I managed to do this today, and it was mostly a success, once I found all of the relevant printf statements (some were quite illusive).

I can now run the Pi Spigot and get 1,000 digits of Pi, then it crashes.

That turned out to be an out of range branch (remember those sythetic instructions like jne?). Somewhat surprisingly GNU Assembler doesn’t flag an error in this case.

I’m now wondering if I would have been better trying to rebuild the source of one of the old PDP-11 assemblers.

The version in V7 Unix is itself written in assembler, so there may be a bit of a bootstrapping issue there:
https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/as

But the version in V10 Unix is written in C, so it can run on the Vax:
https://www.tuhs.org/cgi-bin/utree.pl?file=V10/cmd/PDP11/11as

I’ll try to port this next…

Dave

2 Likes

Oh wow, well done! Persistence pays off. (I must admit, my thought was to try to use sed to adjust the integers. It might have become pretty messy.)

I’m not sure if this helps, but gcc can output macro11 assembly (see example-macro11), and a version of macro11 exists for modern systems (Rhialto/macro11).

The output of the assembler is an RT-11 compatible object file, which can be converted to absolute loader form using the dumpobj tool.

1 Like

What about ack? I did successfully use about (albeit with a small fix) and simh to test the v7 binaries (-mpdpv7 option). GitHub - davidgiven/ack: The Amsterdam Compiler Kit
A patched version of apout can be found at GitHub - tsupplis/apout: Changes and fixes to Apout, a PDP-11 binary translator (incorporate also changes from DoctorWikt/Apout).

I use both on macOS.

2 Likes

Thanks for this pointer @Dare - thought I’m not yet sure I need switch assemblers in the GCC tool chain, as all the bugs I’ve hit have been in the compiler itself.

Excellent - a third possible “Modern” PDP-11 Compiler to try.

Now this is intesting as it would allow me to directly use the V7 or V10 assembler on Linux without needing to try to recompile the source. I’ll definitely try that as an assembler for PCC output.

It also makes it easier to try some of the old compilers than running them in SIMH.

Thanks all…

1 Like

I’ve extended the build script now to include the PCC compiler followed by the Unix V7 assembler (running with apout). You can see the new section here.

The Unix V7 assembler allows the JBR and JCC pseudo instruction to be treated correctly - something which is not currently possible when using the GNU assembler.

I’m getting identical binaries with both assemblers, apart from where those instructions need the long form, which is encouraging.

I’ve very impressed with apout - it’s working very well.

Dave

3 Likes

I do think you are better off with your twist-a-modern-compiler approach. Sounds like you are only one hurdle away from success. And… Maybe I’m thinking to much like a “functional programmer”, but can’t you insert (or “pipe” together) a simple translator that converts the synthetic instructions to real ones?

How does it fail? Does it work if you add .word before the data?

Possibly, but there were other problems as well that affected PCC output.

  1. the default number base is decimal not octal
  2. extended branch istructions (JBR/JCC) are not supported

I was able to address (1) by updating the code generator in PCC

Unfortunately I was not able to address (2) - even with macros - so I stopped at that point.

In the end I was able to get code from all four compilers running.

I still need to update the wiki page with the details of the 4th option: ACK (Amsterdam Compiler Kit) but this seemed to be the best option.

There is a comparison here:

Dave

3 Likes

3 small remarks:

  1. There is a higher rate maintenance fork of ack (mainly focused on MS-DOS) at https://github.com/tkchia/ack. I did not have time to do a regression for PDP-11 but worth looking at.
  2. Also, For new generation macOS compilation, it is necessary to trick the build with a something like a gcc wrapper to have this option -Wno-error=implicit-function-declaration.
  3. It is with noting that the compiler focuses on unix v7 but and not bsd 2.x
1 Like

Not to sidetrack the thread away from the PDP-11, but I thought I’d
mention some recent entertainment I’ve had with the B-em emulator with
another coprocessor emulator – the NS32000 “Tube” running
PanOS 1.4.

There was a VAX Fortran port of the University of Texas text
game “Super Star Trek” (itself an expansion of Mike Mayfield’s
original BASIC game) floating around on the Web a few years
back (with attributions “PRODUCED FOR YOUR ENJOYMENT BY
DAVID MATUSZEK AND PAUL REYNOLDS WITH MODIFICATIONS BY
DON SMITH AND M. KELLOGG”). I got this code back in 2016
from a link on a Web page authored by one Oleg Uryutin of
“Allwards Laboratory”, though that page doesn’t seem to exist
anymore (and was not archived by the Wayback Machine).

Anyway, this code builds and runs on a SimH VAX-11/780
(with VMS 4.7 and VAX Fortran V4.8-276) with no modifications
or difficulties, as might be expected. But after I got
PanOS running I thought it might be fun to try to port the
code using Acorn/PanOS Fortran 77. And – it worked, with
a few gotchas concerning the inter-operability of old-style
Hollerith constants and F77-style “CHARACTER” strings,
which I overcame by “cheating” a bit and linking to some
“helper” conversion routines written in C (with a small
assembly-language interface – the PanOS F77 manual describes the
subroutine calling convention and how to do inter-language calls,
though the examples they give are in Pascal).

You mentioned you found (and fixed) two bugs in the GCC back end. Did you report these? As the pdp11 back end maintainer, I’d like to get the fixes applies to GCC. You could create an issue for GCC, or if you prefer post the description and diffs here.
As for test suite, GCC comes with a very large test suite. All the “compile” tests should work since they don’t need a target system to run on. “execute” tests take more work, as you pointed out. I’ve been able to run a fair number of them using a quick & dirty “bare metal” pdp11 flavor of Newlib, but that isn’t fit to publish, it needs to be refined.

1 Like

Sorry @pkoning, I didn’t get around to reporting these.

There is a decent write-up on this wiki page:

Here are the two changes:

The first fix is in: gcc/config/pdp11/pdp11.c

--- pdp11.cc	2022-06-22 18:23:09.856751320 +0100
+++ pdp11_fixed.cc	2022-06-22 18:34:08.685577564 +0100
@@ -517,6 +517,27 @@
 	  && REGNO (XEXP (XEXP (operands[0], 0), 0)) == STACK_POINTER_REGNUM
 	  && reg_overlap_mentioned_p (stack_pointer_rtx, operands[1]))
 	sameoff = true;
+
+      /* DMB - catch for source [1] is an indirect access via a register that
+         is also used as the destination [0] */
+      if (GET_CODE (operands[0]) == REG && GET_CODE (operands[1]) == MEM) {
+        int dstreg = REGNO (operands[0]);
+        int srcreg = -1;
+        if (GET_CODE (XEXP (operands[1], 0)) == REG) {
+          srcreg = REGNO (XEXP (operands[1], 0));
+        } else if (GET_CODE (XEXP (operands[1], 0)) == PLUS) {
+          if (GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG) {
+            srcreg = REGNO (XEXP (XEXP (operands[1], 0), 0));
+          } else if (GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == REG) {
+            srcreg = REGNO (XEXP (XEXP (operands[1], 0), 1));
+          }
+        }
+        /* If the source and destination registers are the same, force
+           little endian instruction order */
+        if (srcreg == dstreg) {
+          useorder = little;
+        }
+      }
     }
 
   /* If the caller didn't specify order, use the one we computed,


The second fix is in: gcc/config/pdp11/pdp11.md

--- pdp11.md	2022-06-22 18:23:14.736815708 +0100
+++ pdp11_fixed.md	2022-06-22 18:26:21.043289924 +0100
@@ -271,6 +271,14 @@
    output_asm_insn ("tst\t%0", exops[1]);
   else
    output_asm_insn ("cmp\t%0,%1", exops[1]);
+
+  // Correct V/N flags so signed comparisons work
+  output_asm_insn ("cln", NULL);
+  if (!CONST_INT_P (exops[1][1]) || INTVAL (exops[1][1]) != 0) {
+   output_asm_insn ("clv", NULL);
+   output_asm_insn ("bcc\t%l0", lb);
+   output_asm_insn ("sen", NULL);
+  }
   output_asm_label (lb[0]);
   fputs (":\n", asm_out_file);

Dave