Cramming a C64 BASIC program in 32 bytes

IsaacKuo · July 30, 2021, 12:26am

Here’s a video showing how someone crammed a BASIC program into 32 bytes:

0 PRINTMID$(“XXXX-|”,RAND(0)*6+1,1);:GOTO

Normally this would save to a 35 byte file, but certain tricks and a possibly dubious assumption (that works on emulators but not necessarily real hardware) shave off a few bytes.

Anyway, inspired by this I crammed my own version into only 23 bytes using machine code instead of BASIC. I had never written an autostarting prg so that was an interesting learning experience.

There are a few ways to autorun, but most require a lot of bytes of boiler plate. In order to cram my version into 23 bytes, I had to figure out a method that I eventually reduced to zero bytes of boiler plate (by reusing bytes that would become part of my lookup table). This stack overwrite method involves locating the program on the 6502 stack, such that it overwrites the pending JSR return address. I had to figure out how to ensure both of these bytes would be non-printing character codes. Furthermore, the program requires calling the CHROUT routine, meaning it would ALSO be pushing an address onto the stack. I needed this address to also consist of non-printing characters.

I also saved a byte by finding some place in zero page guaranteed to have a pointer I could use for an indirect table lookup (one byte zero page address indirect lookup rather than two byte address direct lookup). That was a crazy wild goose chase, but I did eventually find a couple bytes guaranteed to have 255, 0 in them. At that point, all I needed to do was refactor things to generate 248…255 randomly rather than 0…7 randomly.

So basically, I started off with something I fit in 27 bytes, with reasonably elegant comprehensible code. And then I shrunk it down to 23 bytes with increasingly obtuse hacks.

; Program: maze23byte.asm
; About: 23 byte print maze

; machine code version of 32 byte BASIC maze program:

; Size Optimizing Tricks in 10 PRINT Orthogonal for Commodore 64
; Jul 28, 2021
; https://www.youtube.com/watch?v=oPtjcSsS2MI

; autorun ref https://www.pagetable.com/?p=568

; target processor
  processor 6502

; code origin
  .org $01F8-1          ; Writing to stack for JSR return

table:                  ; PETSCII print code table at 255+248 ... 255+255
                        ; This is because I found PEEK(58)+PEEK(59)*256 = 255 in zero page.
                        ; Thus, letting me replace a 3 byte table lookup with a 2 byte indirect lookup

  .byte 96              ; horizontal bar - located here to align $01F8 JSR return address as needed

  .word autostart-1     ; JSR return will increment by 1
                        ; autostart-1 is non-printing chars
                        ; Also important is that JSR return address will also be non-printing chars

  .byte 171,177,178,179 ; T shapes
  .byte 125             ; vertical bar

CHROUT = $FFD2     ; KERNAL CHROUT
RASTER = $D012     ; Raster counter

loop:
  lda (58),y       ; shaves a byte compared to normal table lookup PEEK(58)=255, PEEK(59)=0

  jsr CHROUT       ; location of this JSR needs to end on address that will be non-printing chars

autostart:         ; autostart starts here, address chosen to be non-printing chars

  eor RASTER       ; generate kinda random byte
  ora #248         ; modulo 8 + 248
  tay
  bmi loop         ; branch shaves a byte compared to JMP

Source and prg here: Index of /stuff/maze27byte

EdS · July 30, 2021, 12:50pm

Quite the brain teaser - well done!

(I have to say I did, eventually, understand your commented assembly. I can’t say the same about the video - perhaps narration just isn’t the way for me to take on this kind of thing.)

IsaacKuo · July 31, 2021, 12:08am

Thanks! I’m guessing you are relatively familiar with 6502 code, and not at all familiar with the way Microsoft BASIC worked at a low level. Also, some familiarity with Commodore 1541 floppy disc formatting helps. I’m familiar with it because A) I was a USA C64 user and practically all US users had floppies, and B) I used Disk Doctor (or equivalent) back in the day to edit stuff on floppy at a low level.

NoLand · July 31, 2021, 6:12am

Regarding the video: The BASIC version uses a few “dirty” tricks to achieve its 32-byte claim, which do not work on real hardware.

One of the key tricks is about how the program is loaded. For this, we must risk a brief look into Commodore 8-bit history: On the PET, a BASIC program always started at $0401, and this address was also the first two bytes (in low-byte, high-byte order) in a stored program, which was then followed by the literal memory contents. Loading a program would always load a program at the address provided at the start of the file. This changed with the VIC-20, which had a variable BASIC start, depending on the presence of any memory extension, and the C64, which had the BASIC start at $0801. Commodore worked around this by introducing a secondary address for the load command. Without this, the program would be loaded to wherever the BASIC start was and would be relinked, with the secondary address “1”, the program would be loaded to the address given at the beginning the file as a binary program (or data).

This brings us to the linking business, and the anatomy of a Commodore 8-bit BASIC program in memory. This is essentially a linked list and looks like this:

+-- AL AH ... link, binary address of next line in memory (low,high)
|   LL LH ... BASIC line number in binary (low,high)
|   (...) ... "payload"
|   00    ... zero-byte as end-of-line (EOL) marker
+-> AL AH ... start of second line = link to next line

etc.

If the link is eventually 16-bit zero, this marks the end of the program, as far as BASIC is concerned. Meaning, every BASIC program ends in 3 zero-bytes, one for the EOL marker of the very last line and two for an empty next-line link.

While presumably a BASIC program, this one must be loaded as a binary program, and, if we try to edit it, it fails. So something must be at odds. Turns out, the program relies on an emulator’s memory to be initialized by zeros, which is never the case on real hardware. But, assuming so, it doesn’t include the trailing 3 zero bytes, neither the EOL marker, nor the empty link as the end-of-program marker. So BASIC has no information on how to relink this program, once we commit our edit by pressing return and runs away in search for the end-of-line.

Well, this is easily fixed by inserting the required zero bytes using a monitor program. – However, this isn’t enough. There are also some system pointers, which are controlling the memory layout of the machine. One marks the BASIC start (you may actually put this, where ever you want), One the first byte after the end-of-program, which is where variables start. (There’s also one of where arrays start and one for the top of memory, where strings start to grow down, but these are not of interest here.)
Since we didn’t load this as a BASIC program, the start of variables isn’t set accurately, rather pointing to the last byte loaded. (This is also an issue with the very first ROM for the PET 2001, when loading files from the IEEE bus. You have to adjust this pointer manually, otherwise any variables will corrupt the program. – I found this out the hard way and thanks to this, my PET emulator will fix this up automatically, when using ROM 1.0.)
With this issue fixed, the program may be actually edited.

But this isn’t all. If this is a BASIC program, why can’t it be loaded as a BASIC program? If we try to do so, it doesn’t list anything.

Inspecting the contents on disk, we discover that also the start address is off by one byte, pointing to $0802 rather than $0801. – Why on Earth would you do such a thing?
Well, it shoves off another byte. we may recall that the first two bytes provide the binary loading address, while the start of BASIC is set in that zero-page pointer. So, if we start at the second byte, effectively skipping the low-byte of the first line-link, we may “reduce” the load by this byte. (The program, as on disk starts with the high-byte of the link address and two zeros for the line number, followed by $99, the BASIC token for PRINT. The BASIC start is still set to one byte earlier, so we get the first, missing byte for free, if we run the program.) Loading this as a regular BASIC program simply doesn’t work, as we’re off by one.

All this actually works, because the program, when running, never hits the end of line. At the end of the line, it just jumps to the beginning of this very line, which is also the first one in memory, and BASIC never refers to the line-link to search for another line number. At the same time, it also defeats itself, as this last GOTO command is missing its line number as another optimization: as the interpreter encounters the end of line before any line number was found, it rather uses a zero preset, which is also our line number. However, there is no end-of-line found in memory on a real machine…

Facit: there are 4 essential bytes missing (one at the start and three at the end of the program).

EdS · July 31, 2021, 9:58am

Thanks, that’s much easier for me to digest (than the video was)! Is it the case that on real hardware, all would be well for loading and running, but not OK for editing? Or would even the running of the program require a zero-initialised memory?

(I’m reminded of the oddity that the Basic on the ZX Spectrum includes both ASCII and binary for numeric constants, so for example INT PI takes less space than 3, in a Basic program.)

NoLand · July 31, 2021, 10:33am

As said, it defeats itself. When loading it as a binary, it may not matter that the line-link is off (pointing to whatever is in $0801 and the value $08 in $8002, so on the machine seen in the video probably $08FF). So the programs starts, progresses over the PRINT statement and eventually arrives at the GOTO command. Now, in oder for the missing line-number-is-effectively-zero trick to work, it should hit the end of line. – And it actually does, I got this wrong, there is a single zero-byte (but the other two are missing). So that should work.
However, as shown in the segment starting at 4:36, if we load it normally, there is no program, since the binary source is a byte off, and if we load it as a binary, the memory initialization fails and the machine freezes, as the start of BASIC variables is just inside the program.

In the first case, Commodore BASIC proves more robustness, than expected. What it does read, is a line-link to $0008, a line number of binary $9900 (the token for PRINT and leading low-byte of zero), decimal 39168 (which is still in range) and $CA for “MID$(”. So our line 39168 starts with “MID$(”, which should throw a runtime syntax error. But this doesn’t matter, as it doesn’t process that line at all. It fails rather gracefully.
(I actually do not know why that is. – I’d expected it to process the first line and then to either run away on the second line, which is just random memory, or to stop on the next link being equal to the stored end-of-load address, if there is some run-away protection. But neither is the case)

NoLand · July 31, 2021, 2:12pm

Just out of personal interest, may I ask which assembler you were using?

(It assembles at virtual 6502 Assembler out of the box. That is, after the line selecting the target processor. If this is kind of a standard tool, it may be worth the extra bits required to ignore such a line.)

IsaacKuo · July 31, 2021, 4:14pm

I am using DASM (on Debian Linux, amd64 hardware). I do not know anything about other options. Some years ago, I did some very brief research to quickly decide on DASM for some VIC-20 development, and just used it ever since just because that’s what I was used to.

NoLand · July 31, 2021, 5:09pm

Great! Thank you.
(I think, the processor directive may be omitted as it defaults to 6502, but I may be wrong. It has been a bit, since I last used it. But it’s a great assembler.)