Commodore (MS) BASIC and the peculiar use (or lack thereof) of the X-register

Yesterday, I was updating some code of my PET 2001 emulation and had a hard time recovering the secondary address for the LOAD command. While the original value, stored in the zeropage, had been overwritten by some subsequent routines, eventually, I found it could be recovered from the X-register! And this, after the flow of control had traversed quite a number of instructions!

Actually, this isn’t a rare case. If you have a closer look at the BASIC routines, the Y-register is heavily used, as is the stack, but the X-register rarely so. It’s more used as a store, some zeropage addresses even enjoy a more frequent use. And then, when you think that a value has been ultimately lost, Commodore BASIC produces that value out of thin air, like a magician, right out of the sleeve that is the X-register.

Provided that the 6502 features just 3 usable registers (AC, X, Y), this appears to be a rather peculiar use of a that rare and crucial resource like a processor registers. Is there a prehistory to this (as in other processors)? How does this compare to other flavors of MS BASIC, like z80 code (which I’m not familiar with), is there similar to be observed?

And here’s the full story, including relevant bits of the ROM listing:

1 Like

Interesting… one might be able to run some sort of lifetime analysis of values in A, X, Y, and C. It would be interesting to see if, for example, the MS-Basic family had a different signature of lifetimes for values, compared to, say, Acorn’s BBC Basic for 6502. Or an Atari Basic. In other words, does authorship make much difference.

And then, yes, for other microprocessors, how might they differ. The more registers they offer, the more likely that some of them have very long lifetimes - especially if they are more or less special purpose registers.

To be precise, there’s a certain pattern to this. If we have a look at the bits of ROM listing reproduced in that blog post, the general flow is usually just using the A register (accumulator). Then, there are specialized routines, which either use mostly and/or solely the X-register, like the routine collecting the parameters and the subroutine called by this. And then there are service routines, which will use just the Y-register, like the ones servicing the IEEE bus.

As a result, there are related sections of the code, which are either X- or Y-heavy. As the code is following along the steps of a certain task, it’s quite likely that it visits just a “family” of service routines. If it’s about BASIC parameters, it’s all about the X-register, if it’s about servicing low level IEEE stuff, it’s all about the Y-register. Typically, that register is also used to pass parameters, ready to be used in the next step. So what may seem like a wasteful use of resources at first sight, is actually a bit of clever coding and code discipline. – And as side effect, like in this example, when the code diverts back from low level negotiations to BASIC parameters, some of the previous state is still intact.

Apparently, the Y-register is more used for low level stuff, so it’s small wonder that the X-register should be more longlived, once a certain task ist established. And, of course, the decision on which register is used, is crucially informed by the specific indexed indirect and indirect indexed zeropage addressing modes. E.g., are there tables to be looked up or are there blocks to be moved around? And some of these tasks are more likely to be found at a certain level in the code hierarchy than at others.

You may be right, thinking of different kinds of code. But I’m thinking instead about different authorship…

I imagine Shakespeare (or, in the case of the 6502, Bruce Clark) would use all the 6502 registers, like a juggler with only three and a bit hands.

There is an asymmetry in the 6502, which is the two indirect modes, needed when dereferencing a pointer (necessarily in page zero) which use either X, or Y, depending on whether pre-indexing or post-indexing. The pre-indexed mode is very rarely used, so (personal opinion ahead) that weakly pushes a certain usage of Y to the fore, and in turn that makes X very slightly more general purpose.

Here’s a quote from Atari Roots:

it’s not too likely that you’ll ever find much use for indexed indirect addressing.

One oddity with MSBasic - the core of it, as written by Microsoft - is that AIUI the programmers were not greatly experienced 6502 programmers. BBC Basic is the opposite, in this respect. And then, the Commodore extensions, like the IEEE interfacing, would have been written by different people, I think?

BTW there’s a lovely diagram and table by Bob Sander-Cederlof which says a lot about the 6502 (but little about addressing modes):

Regarding programming styles, this an aspect I initially tried to follow along in my investigations of the Spacewar! code (episode 2 is actually a who-done-it), AKA “source code as text”. But this one may be also due to a general discipline and code optimization in search of extra bytes in the limited ROM space. (I guess, when you’ve done a number of BASIC ROMs, you’ve learned a bit along the line.)

Edit: I wasn’t that determined about this and wondering why the code was done like this, when I started this thread, but now I really do think that this is the cause for this.

Regarding 6502 architecture, something I never investigated: is there actually proper microprogramming, meaning, is there an ALU to the ALU managing micro-steps and the internal bus? The dye shots look nearly too simple in organization to provide for this.

1 Like

No, I would not personally describe the 6502 as microcoded. In fact, I believe it’s an error, although near-universal, to label the large structured part of the decoder as the microcode ROM. It is, however, a rather regular machine, and I’d say that’s how it manages to deliver the combination of low transistor count and high usefulness. There is a timing controller, a sequencer of sorts, but it’s not precisely a counter. The combination of these things is why the RMW instructions follow the same steps, except when they don’t.