Commodore BASIC Renumbering & Program Representation

Commodore’s rather spartanic BASIC, like many other implementations on the more basic side of things, lacks a RENUMBER command. I recently added a renumbering utility to my version of Tom Scibo’s PET 2001 emulator and since this involves a few interesting things, like how BASIC is represented in memory and how variables are allocated, I wrote a blog post on this. (The algorithm described is in JavaScript, but could be also adapted for a 6502 machine language implementation.)

BTW, did you know that adding a simple variable during execution (or in direct mode) involves moving any arrays in the course of the respective memory allocation? Meaning, to optimize for this, be sure to have all variables defined before defining any arrays by DIM. (This is quite contrary to what was recommended back in the day.)

2 Likes

The BASIC in the C128 had renumbering. I remember using it.

I think, the later models, like the Plus series, had it as well. This is really owed to reusing the original BASIC license for the popular PET, VIC, C64 series. Extending BASIC would have required a new license, so Commodore BASIC was stuck in 1977 for quite a while.

Wow, that’s quite the observation about arrays being relocated!

Yes, there’s just a single pointer, ARYTAB, which is used to point both at the next available location for allocation of simple variables and to mark the beginning of the array space. Hence, when a variable is added, the entire block of previously allocated arrays has to be moved up in memory to provide the space required (7 bytes per simple variable).


PS: Maybe also of interest, string handling:
String variables are pointers themselves, pointing at the location of the stored text string, this is either a string literal in the BASIC text (no duplication required) or a sequence stored at the top of BASIC-RAM, growing from top to bottom.

E.g.,

 A$ = "TEST"
 B$ = A$

No string memory is allocated, both A$ and B$ are pointing to the string literal inside the BASIC text (the program).

However,

 C$ = "ANOTHER " + A$

A new string sequence “ANOTHER TEST” is generated and stored at the next available space below the last string stored in BASIC-RAM.

Quite counter-intuitively string operations, which access partial strings and could just reference an existing string, will also generate a new instance in string storage. E.g.,

 D$ = LEFT$(A$, 2)

There are several consequences of this:

  • Garbage collection and memory are not an issue, as long as you reuse strings, which have been already defined, be it as a composition or as a literal in the BASIC text.
  • However, string literals in the BASIC text may cause severe problems, if the BASIC text has been altered in any way in the meantime, e.g., by loading a (temporary) code overlay. (In this case, you may want to rather compose your strings from parts, in order to have them allocated separately.)
  • There may be performance issues, when using “+” for string concatenation in PRINT statements as opposed to using semicolons (“;”). Also, portable listings using CHR$() may be less efficient than using PETSCII screen characters embedded as string literals in the BASIC text.
  • Composing strings just for the sake of printing (as opposed to joining the output by semicolons) should be avoided.
1 Like

So, here is another blog post, going into details on variables, arrays, and strings, providing some detail on what was mentioned above.

1 Like

Interesting reading, as ever - thanks! I wonder, idly, what the smallest simplest program with unexpectedly terrible runtime might be - what pathological rearrangements of furniture can we convince the interpreter to do?

You mean, for example, something that would cause me to miss uploading an image?
That’s apparently easy to achieve, not that difficult at all.
(The missing image is now added to the blog post.)

More seriously speaking: I think, something, where this could be exploited productively, is modifying a given display string by POKEs. Which may be of use for, say, writing a video game for a BASIC 10-liner contest.If we put the string definition at the very beginning of our program, it’s rather easy to come up with the POKE address.

E.g.,

10A$="12345678":REM <- INSERT SOMETHING MEANINGFUL HERE

On a C64, BASIC starts at $0801.
Add 2 bytes for the link to the next line in memory, and another two for the line number (stored always in two bytes as a binary value). Thus, A$ starts at $0805. From here, we may count to $0809 (dec. 2057) as the address of the first character of the string (as stored in the BASIC text and referenced by the string variable A$).
Now, at some point in the program, we could replace the “2” by another byte, e.g., “POKE 2058,INT(RND(0)*5)+65” and print it again…

Or, the other way round, since this is the very first variable defined, it’s also easy to come up with the variable address in memory, just PEEK the value in pointer VARTAB. Now the 3rd byte contains the length and the next two bytes the storage address. By modifying one or both of them, we may readjust some padding on the left or right.
Suppose this represents a line of the screen for printing, but it’s more than 40 characters long. By setting the length to 40 and modifying the pointer, we may now scroll this line horizontally. If we increase the value in the pointer by one, we will lose padding on the left of the string and the line, possibly representing a character somewhere within, will also to the left.

(I may actually add this to the blog post. … Update: and I did so.)

Proof of concept, PET 2001, ROM 2.0 (“new ROM”):

cbm-variables-exploit

Obviously, we could put lines 40-70 into a single line as a subroutine.

The shortest one would seem to me to be to DIM an array (perhaps multidimensional) taking most of the heap, and then start creating scalar variables, forcing that large array to be moved again and again.

However, you might be able to destroy performance even worse (at the cost of a larger program) by allocating a huge number of small strings, enough to fill the heap, and then deallocating and reallocating strings to force lots of GCs.

Hmm. Once you’re in that world you could make BASIC programs do some pretty mysterious things. They’re saved as binary images of the text area, which means that it’s easy enough to extend the length of the save to cover the variable and array areas as well. (It’s not clear to me if the SAVE command can be used to do this—I see SAVE "name",8,1, which seems as if it may take a third parameter for the address, but nothing about how to specifiy the length—or if one would need to write a machine-language routine to call into the KERNAL directly.)

So these would then be restored on load, not visible to the user unless he printed out variable/array values before running the program. Typing RUN would make these variables go away, but typing GOTO 10 or similar would leave them in place, which could dramatically change the program behaviour.

Or you might even be able to restore the hidden data when RUN using appropriate pokes and other magic. That might actually be real-life useful for something like an adventure game, where you want to hide a lot of the data from a LIST command so as to avoid easily spoilers.

1 Like

I was thinking more along the productive, generative side of things… :wink:

cbm-variables-exploit-poc

(Here, we apply a window of length 40 into the string literal defined for A$ at a randomly sliding position and print this repeatedly, providing the basis for what might be a tiny, 10-liner cañon game. Addresses are for the PET 2001 with ROM 2.0.)

You have to POKE the addresses for the range. I do not remember positively (as nowadays most is done by cross-assembling), but these are probably the pointers TXTTAB (start of BASIC text) and VARTAB (end of BASIC text / start of variable space). Doing so, it’ll be just like saving a normal BASIC program.

1 Like

I found this example for saving $C000-$CFFF absolutely on a C64:

POKE 43,0:POKE 44,192
POKE 45,0:POKE 46,208
POKE 55,255:POKE 56,255
SAVE "ML",8,1

43/44 … TXTTAB, start of BASIC text
45/46 … VARTAB, end of BASIC text + 1
55/56 … MEMSIZ, top of BASIC RAM (normally $A000)

Setting MEMSIZ (here to $FFFF) is required only, when saving a memory slice above address $A000 (dec. 40960), in order to prevent an out of memory error. (Obviously, these pointers should be reset to sane/previous values after the operation.)

1 Like

Finally, as a finale to this mini series, here’s another blog post on implementing a 10-line video game in BASIC, for this putting some exploits of the string mechanism to viable use:

The most noteworthy trick used is using RIGHT$() to implement a fast FIFO queue for byte-sized values, while resetting pointer FRETOP to temporarily freeze memory allocation in order to avoid running into garbage collection.

The resulting game can be played online here.

2 Likes