Help with c code (bit reversal)

mainframetom · March 6, 2023, 7:02pm

I read on the SBC6120 tools (Spare Time Gizmos) that the PDP-8 handles bits in reverse order. So I want to convert my files.
There’s a tool from year 2000 by Robert Armstrong called pdp2hex, binary works on Win 32. But I want to change it.
I can’t compile it under linux (like usual with old code).

I want the 6 bits LSB reversed. The 2 MSB should always remain 0.
So the hex byte
3D (00 111101) should be converted to
2F (00 101111).

The relevant code snippet is

BYTE Reverse (BYTE x)
{
  if (fReverse) {
    BYTE b0 = (x   ) & 1;
    BYTE b1 = (x>>1) & 1;
    BYTE b2 = (x>>2) & 1;
    BYTE b3 = (x>>3) & 1;
    BYTE b4 = (x>>4) & 1;
    BYTE b5 = (x>>5) & 1;
    return (b0<<5) | (b1<<4) | (b2<<3) | (b3<<2) | (b4<<1) | b5;
  } else
    return x;

I only have basic knowledge of C programming. I know how to open and write to a binary file etc.
fReverse is when the option -r was enabled for it. BYTE is defined in the header file (unsigned char).
I got errors that b0 isn’t declared. I also haven’t found BYTE Reverse and not even x.

On stackoverflow there are several ways (not all are suitable for 6 bits).

I don’t need speed optimization and a look up table is too much work.
The best code for me there is

unsigned char   reverse_bits(unsigned char octet)
    {
        return  (((octet >> 0) & 1) << 7) | \
                (((octet >> 1) & 1) << 6) | \
                (((octet >> 2) & 1) << 5) | \
                (((octet >> 3) & 1) << 4) | \
                (((octet >> 4) & 1) << 3) | \
                (((octet >> 5) & 1) << 2) | \
                (((octet >> 6) & 1) << 1) | \
                (((octet >> 7) & 1) << 0);
    }

But what is reverse_bits (a standard function? a variable?) and what is octet and how to declare/use these? Nothing explained at all.

Later I also have to try swapping 2 bytes each. As the low byte has the 2nd byte first.

pdxjjb · March 6, 2023, 8:30pm

Hi. Perhaps I can help. Your second code snippet is the C language source code for the definition of the function reverse_bits. You can copy that and put it in your source code and it will compile since it makes no reference to anything else. The function takes one argument, type unsigned char, named octet. It returns a value composed of the entire math expression from the return to the semicolon.

As you observe, the function won’t do what you want. It will take 3D (00 111101) and convert it into hex value BC (10111100), because it will reverse the 2 most significant bits into the least significant bits. We’ll get to that in a minute.

If I understand your post, you can’t compile for Linux, but you can compile for Windows. The first function will do what you want, but there are two issues. First, it makes reference to an external flag variable fReverse. Second, you get a syntax error on BYTE.

Here’s a way to fix that. I’m not saying this is good style, only that it’s the simplest answer I can think of. First, add this line directly above the function:

#define BYTE unsigned char

Don’t put any thing else on the line: no semicolon, no parentheses. Second, change the symbol fReverse to the digit 1. This is equivalent to “true” in C and using the number will avoid any possible issues with symbol definitions.

Post back here and let us know how that goes. EDIT: @drogon’s question below is probably important too.

drogon · March 6, 2023, 8:32pm

reverse_bits is a function. The definition lies in-between the {}'s.

It takes a single parameter called octet or type unsigned char (typically an 8-bit value) and returns a value of unsigned char.

The function returns the bits in the passed in value swapped. It starts with bit 0:

(octet >> 0) & 1

then shifts this up 7 places.

The sequence repeats, masking each bit in-turn and shifting it up and ORing it into the result.

It doesn’t need any temporary variables as the compiler hides them all for you.

I’m a little confused though - I don’t recall doing any bit fiddling like this in my own PDP-8 emulators…

-Gordon

EdS · March 6, 2023, 8:43pm

I suspect the idea of the PDP-8 bits being in reverse order isn’t what it’s being assumed: they are, I think, conventionally numbered in reverse order, so that bit 0 is the MSB rather than the LSB, but it’s still the case that the MSB is at the left and the LSB at the right.

I’m not sure that any bit-twiddling is going to be needed. But I could be wrong. Anyhow, it’s an interesting challenge. The book Hacker’s Delight by Henry Warren is full of solutions to such challenges, including a handful for this kind of problem.

Some of the ideas in Hacker’s Delight appeared earlier in HAKMEM, a famous memo… and there seem to be three ideas in there credited to Schroeppel. Or maybe more. Here’s one - note the use of octal constants:

/* if argument is 6 bit quantity, return 6 bits reversed (Schroeppel) */
unsigned reverse_6bits(unsigned36 a) {
  return ((a * 02020202)        /* 4 adjacent copies shifted */
             & 0104422010)      /* where bits coincide with reverse repeated base 2^8 */
             % 255;             /* casting out 2^8 - 1's */
}

drogon · March 6, 2023, 9:07pm

Re. Use of table look-up…

In my Ruby 6502 and 65816 boards, the 6502/816 talks to the ATmega host MCU via an 8-bit bus, however to make my life easier, the PCB layout is such that Bit 0 goes to Bit 7, 1 to 6 and so on.

I use a table on the ATmega side which I generate at boot time:

#ifdef  RUBY_REMAP
static uint8_t sramRemap [256] ;
#endif
...
#ifdef  RUBY_REMAP
  uint16_t i, map ;
  for (i = 0 ; i < 256 ; ++i)
  {
    map = 0 ;

    if ((i &  0b00000001) != 0) map |= 0b10000000 ;
    if ((i &  0b00000010) != 0) map |= 0b01000000 ;
    if ((i &  0b00000100) != 0) map |= 0b00100000 ;
    if ((i &  0b00001000) != 0) map |= 0b00010000 ;
    if ((i &  0b00010000) != 0) map |= 0b00001000 ;
    if ((i &  0b00100000) != 0) map |= 0b00000100 ;
    if ((i &  0b01000000) != 0) map |= 0b00000010 ;
    if ((i &  0b10000000) != 0) map |= 0b00000001 ;

    sramRemap [i] = map ;
  }
#endif

No shifting, masking, etc. just fixed constants. It needs 256 bytes of my precious 16KB RAM, but it’s worth it.

-Gordon

pdxjjb · March 6, 2023, 9:26pm

I know. I have a bit-reverser in some Arduino code that uses table lookup. I was just trying to help the OP accomplish exactly what he asked with minimum effort. Your bit reverser, and mine, both require a little work on the code, and the OP is not an experienced C programmer.

NoLand · March 6, 2023, 9:58pm

I second this. Digital (DEC), for their earlier machines, used to refer to bits starting with 0 for the most significant bit, counting up to the least significant one, from left to right.

E.g., if we have a 12-bit word, like octal 5423 (dec. 2835),
this would be referred to as (properly grouping bits into octal triplets):

    5   4   2   3      (octal)
   101 100 010 011     (binary)
   ^             ^
   |             |
 bit 0   ...   bit 11  (bit order according to DEC)

Still, bits are “lined up” in memory just the usual way.
The only difference being that we would noways start counting at the very last, least significant bit, the 12th from the left, while DEC used to refer to the very first one on the left as bit 0.

Importantly, this has no impact on the architecture of the machine, but is important for understanding the manuals, for example, when they refer to “bit 2” in an instruction word. (Which, for a 12-bit word, would be bit 9 in modern terms.)

Or, another example, the PDP-1 featured 18-bit words, with 5 instruction bits, an “i-bit” and 12 bits for the address/operand part. This looks like,

 IIIIIiAAAAAAAAAAAAA
      ^
      i-bit

DEC referred to the “i-bit” as “bit 5”, while we would refer to it as “bit 12”. But this is just a matter of two ways of referring to the same thing. The bit-order in memory is still the same.

NoLand · March 6, 2023, 10:54pm

From a more philosophical perspective: This is really a matter of talking.
(And nothing like endieness.)
The modern terminology puts an emphasis on the powers of two and how they contribute to a numeric value in binar representation.
DEC’s terminology is maybe more abstract, emphasizing the serial nature of bits.

There’s something to be said in favor of DEC’s terminology, especially, if you are dealing with these machines low-level, bare-metal, as these early DEC machines were meant to be operated. Here, it’s always about groups of bits, or a certain bit (like, say, the sign bit), and important things are usually at the “left”, with the “right” parts often thought of as a rest, like an arbitrary operand. Numeric representation is more like an edge-case. From this perspective, DEC’s way of referring to individual bits in a word was probably more intuitive, at least for the time.

mainframetom · March 7, 2023, 9:27am

Thanks everyone for explaining! I still have problems.

Compiling: Doesn’t work. I haven’t noticed that there’s a makefile for BSCMAKE including 2 other (independant) c files and some input files. I haven’t tried compiling under win as there’s already a binary. Under linux there are some severe syntax errors like:
expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘*’ token,
subscripted value is neither array nor pointer nor vector hpwPDP[nChecksumOffset+wBase] = wChecksum; (the [)
and several unknown type names.

Maybe the reversal is just for reading splitted EPROMs or under a special environment. Still unclear how the 6100 (in 6100 mode) handles this. The reversal option is for PDP-8. I wonder how a 6100 is set to use 6100 or PDP-8 code. Completely different results/values on emulator WINEIGHT.

C coding:

Not directly. I have somehow use that function and combine it with my input stream. 2nd snippet:

When replacing octet with my byte stream and printing the byte stream, the result is a byte swap.
When replacing octet with my byte stream and printing the function, 3D turns into 59. + warning below.
When not replacing octet and printing out the function (warning: format ‘%X’ expects argument of type ‘unsigned int’, but argument 2 has type ‘unsigned char (*)(unsigned char). 3D turns into E5. 8 bit but still wrong.

So please a bit more specific. I read from the input file.
fread(&i, sizeof(int), 1, fp);
What to replace in the function if any and what to print out?
printf("%X\n", reverse_bits);

EdS · March 7, 2023, 9:44am

Just to be clear, are you still aiming to make a minor modification to the program pdp2hex, or are you writing your own program?

If it’s the former, are you able to compile the original successfully, and use it, before you make any modifications?

If it’s the latter, and all you want is a bit-twiddling program, that does just one thing, it might be that some language other than C will make for an easier journey.

mainframetom · March 7, 2023, 9:56am

No, I don’t want to use pdp2hex (as it has a different purpose, splitting up to 2 ROMs and write as HEX).
And I wasn’t able to compile the original under Linux. And I haven’t tested it under win. Probably won’t compile on 64bit win either.
I still could run it on 32bit. Should also write as BIN. And then I just have to swap bytes.

But I prefer writing my own code.
What other language would be better (but I don’t know much besides BASIC)

EdS · March 7, 2023, 3:30pm

I think an interpreted language that you already know would be a good choice. There’s a free BBC Basic for most platforms… perhaps that would suit you.

scruss · March 7, 2023, 4:59pm

The SBC6120 (I have one, or at least, the newer SBC6120-RBC edition) uses a slightly unusual hack to map the PDP-8’s 12-byte word into two octets (8-bit bytes). I thought it packed all the bits into the lower 12 bits of the 16-bit word, leaving the upper four bits unused. pdp2hex suggests that it stores 6 bits per octet. However it’s done, the GALs on the board decode the data into the format that the Intersil processor likes.

As to the pdp2hex code, I couldn’t get it to compile even with -std=gnu89. It’s very MSVC / DOS-centric code. As the old (unhelpful) joke says, if I were trying to get there, I wouldn’t start from here.

There may be arcane ways to do this involving xxd (which can dump bytes to binary digits) and rev (which reverses lines of text). The great thing about PDP-8 binaries is that they are quite small, so even the most heinous shell script won’t take any time to process them.

mainframetom · March 7, 2023, 7:59pm

In another SBC tool called bindump there’s a detailed description.

A papertape is only 8 bits wide and the 12-bit word is splitted up into 2 “frames”
The upper two bits of the first frame aren’t empty but describe the frame.
Frame type 0+1 (address) are normal 12 bits.
Frame type 2 leader trailer
Frame type 3 field settings.
Most tapes have only one segment .

<leader> <data> <trailer> <EOT>

but a few (e.g. FOCAL69 with the INIT segment) are
<leader> <data-1> <trailer-1>/<leader-2> <data-2> ... <trailer-n> <EOT>

I think I read about that on DEC manuals, too.
There are 3 different paper tape formats BIN, RIM and ASCII (all 8 ch.).
On emulators, the papertape BIN format is probably the most common one used (for single files). Leader and trailer with 16x80h and includes a checksum. I did it that way.
I think the SBC mainly or only uses disk images.

I don’t know all BASIC dialects and maybe there are issues with file import, RAM space etc.
Luckily, the pdp2hex works on win 32bit. I will tomorrow try to write as BIN.

pdxjjb · March 8, 2023, 4:22pm

I will tomorrow try to rewrite it as BIN

What is “BIN”?

drogon · March 8, 2023, 4:49pm

It’s a DEC PDP/8 paper tape format.

-Gordon

pdxjjb · March 8, 2023, 6:09pm

Got it. Thanks Gordon.

mainframetom · March 8, 2023, 6:30pm

I think BIN is here plain binary, to write to an EPROM, as not having leader and trailer.
I managed to reverse the bits with pdp2hex. I tried all my ROM combinations.
An 8K file was outputted as 2 files, surprisingly each 32 KB with large gaps (different on some files and different amount of bytes, Some might be lost, illegal instructions?).
Then I have merged the 2 output files, needs word merging and low bytes first. Originally I assumed byte merging and high byte first.
I then have read the files on a hexeditor with my predefined, selfmade character sets found on different sites, mainly wiki, to obtain different sixbit encodings.

Very exciting when first opening a file and convert it.

The first ~2 files mainly showed numbers instead of characters, what was new. Also interesting.

Then I had a very good feeling, just when watching a new design of the character output. It’s different if you have spaces, @, lowercase characters etc.
I found 1 or 2 strings like MAY, but obviously just a coincidence.
I have ~8 files and ~8 different character sets. Very time consuming to check thousands of bytes for a 3 character mnemonic which could be aligned in any direction incl vertically.
Then I found many 3 character words next each other, separated with spaces looking like mnemonics, but didn’t make sense.

On one file (one of my recent combinations, 2 pairs of 4 ROMs, byte merged, I found many strings and values, that can’t be coincidence:

_TEST, $GET, UHR (German for clock, it’ a German device), CAL (calendar or calculator?), TEX", <FLD<, K5=SUM, TYP, UPD, CRC, c@B@D@@@A (display ?), and the like. Unfortunately, my device doesn’t show texts in the display except E and F.
The beginning AC2/7FF (as text) could be a checksum. At the end also CR$ and LO and some bytes.

But most of interest again mainly 3 character strings next each other, this time making sense, but not my mnemonics.

PLU (plus), DUP, LFT (left), DIV, KIT, TMP, I N T, and several more.

I still have to do more research and check for PDP-8 code and text in 2 columns.

I have powered up my device with the EPROM board removed to check if the single EPROM puts out an error message, but there was just garbage. So obviously the 6100 code is on the 8 ROMs, as recently assumed due to the memory size and location.
But I assume most contents to be data. Maybe 90%?
The device has its own instructions, up to 14 bits. The main purpose is that one can enter lines of code. Although not executing these instructions itself, it can check for syntax or other errors or wrong values. It also accepts input as decimal and converts them to octal.
I also wonder why there are 2 different styles of EPROMs. I think the Intersil EPROM can directly read/write 12 bit words? At least completely different to program and also accessed within the device.

For more info about the device and my earlier attempts read this thread

mainframetom · March 12, 2023, 6:27pm

The final ROM with the many strings (bit reversed) turned out to have the most plausible PDP-8 code. The problem is, almost no instruction is really illegal and so accepted. Someone has really check all PC and register contents. The strings are correct in emulators, too.
I did it with the file with the empty bytes removed. And then also want to try it with these bytes remained.

But I did some mistakes.
I wondered why the disassembly goes only up to 7577.
Most notably, I replaced the empty bytes (FFFF, not accepted) with NOPs (70 00) and inserted them as hex bytes instead of an octal word.
And by mistake I had reversed all bits again.

-I have one file mistitled, but it wasn’t correct anyway.
-pdp2hex needs as input file the bin file (with leader and trailer) but in 8.3 file format. I think uppercase characters and suffix .BN
-the output files must be in 8.3 style, too, otherwise some files were overwritten and there’s just 1 file instead of 2.

After pdp2hex and merging, a leader and trailer must be added and the checksum again corrected.
I used 3 computer systems. Win 32 bit, 64 bit and linux.

The empty bytes section (in the merged file now just 1 section) starting at 6000-6077 octal. I think a RAM section. So makes sense to be empty.

I also tried my recent file without bit reversal with the fixed NOPs. Both, starting at 0 and 200, has no plausible code like disk i/o.
I have checked my 2 earliest ROM combinations, too. Both files running soon into the empty bytes section. One did a HLT a bit earlier.
My second file, and the first byte merged one, is interesting. Directly after the empty bytes, there’s a loop of reading at 2000. Exactly where the 3 character words looking like mnemonics are located. Needs a further look and longer run.

Emulator WINEIGHT was not much of use on my recent files. Didn’t know many special instructions (incl 6120 mode). The MacOS emu has some more, like 7755, a combined instruction, elsewhere illegal. So I have to check all emulators. Best is still the commandline pdp8emu.

Here’s the beginning of a trace run with pdp8emu. The first words (at least 10-17, maybe 0-177) are auto indexing values, mainly jump addresses (+1) and not instructions. On left the PC, comments by pdp8emu

[0000] IRQ,DLY,IE=0,1,0 L/AC:0/0000 MQ:0000 IR:0103 AND 0103   ;AND operand with AC, ZP 0103
[0001] IRQ,DLY,IE=0,1,0 L/AC:0/0000 MQ:0000 IR:6257            ;
[0002] IRQ,DLY,IE=0,1,0 L/AC:0/0000 MQ:0000 IR:6706 LFGR       ;TM8-E: Load Function Register
[0003] IRQ,DLY,IE=0,1,0 L/AC:0/0000 MQ:0000 IR:0677 AND I @@77 ;AND operand with AC, Indexed Current page @@77
[0004] IRQ,DLY,IE=0,1,0 L/AC:0/0000 MQ:0000 IR:4601 JMS I @@01 ;Jump to subroutine Indexed Current page @@01
[6260] IRQ,DLY,IE=0,0,0 L/AC:0/0000 MQ:0000 IR:1564 TAD I 0164 ;Add operand to AC, Indexed ZP 0164
[6261] IRQ,DLY,IE=0,0,0 L/AC:0/4104 MQ:0000 IR:7303 CLL CLA IAC BSW;Clear L, set AC to 0100
[6262] IRQ,DLY,IE=0,0,0 L/AC:0/0100 MQ:0000 IR:1725 TAD I @@25 ;Add operand to AC, Indexed Current page @@25
[6263] IRQ,DLY,IE=0,0,0 L/AC:0/3010 MQ:0000 IR:0535 AND I 0135 ;AND operand with AC, Indexed ZP 0135
[6264] IRQ,DLY,IE=0,0,0 L/AC:0/0010 MQ:0000 IR:6437 SRCD       ;DP8-EAEB: Read Character Detected
..
[6270] IRQ,DLY,IE=0,0,0 L/AC:0/6317 MQ:0000 IR:6425 SGTT       ;DP8-EAEB: Transmit Go
..
[4621] IRQ,DLY,IE=0,0,0 L/AC:0/0000 MQ:0000 IR:6777 SDRD0      ;TD8-E: Load Data Register into AC, Unit 0
..
[0114] IRQ,DLY,IE=0,0,0 L/AC:0/0000 MQ:0000 IR:6200            ;
[0115] IRQ,DLY,IE=0,0,0 L/AC:0/0000 MQ:0000 IR:0113 AND 0113   ;AND operand with AC, ZP 0113
[0116] IRQ,DLY,IE=0,0,0 L/AC:0/0000 MQ:0000 IR:7755 UNKNOWN!   ;**Unknown**
..
[0106] IRQ,DLY,IE=0,0,0 L/AC:0/0000 MQ:0000 IR:6635 RCNO       ;CR8-E: Read Conditions out to Card Reader
..
[0112] IRQ,DLY,IE=0,0,0 L/AC:0/0000 MQ:0000 IR:7700 SMA CLA    ;Skip on AC < 0, Clear AC

following a long loop, probably waiting for input (key or import ?)

Instructions starting with 6 are for external communication (IOT, some aren’t included in many tools/emu, especially for the 6100, same for WINEIGHT)
Instructions starting with 7 are main instructions mainly to work with the values of the AC and MQ registers. As there are many combinations, not all are implemented everywhere.

0: 0103 is probably a jump address, also after an IRQ.
1: 6257+1=6260 is a jump address called at PC 4. The emu is running over it.
2: 6706, TM8-E is the magnetic tape
6261: A combined instruction including BSW = Byteswap =reversing the 6 bits. So at least here are bits reversed. Maybe just for strings, tables etc? As everything is already reversed, some parts are reversed to normal again?
6264: DP8-EAEB is the TTY full duplex communication.
0106: I don’t have a card reader, might be a different 6120 device?

Maybe I’m completely wrong, but it’s surprising that the bit reversal makes so much sense. First, the longer plausible strings. And now also instructions, locations etc.

drogon · March 12, 2023, 8:00pm

In a PDP/8 the upper words of core are used for the bootloaders. The disk one can start at 7600, so ending an image at 7577 (ie 7600-1) may be prudent if you want to keep the disk bootstrap code in core.

-Gordon