Why did WDC bother with emulation mode?

Most people tend to avoid 6502 emulation mode on the 65c816 unless they’re trying to port 6502 code to it. Why did WDC even bother to include the emulation mode bit? All 256 opcodes work in both native and emulation mode, so there are not very many advantages to staying in emulation mode (and even then you still have to worry about zeroing the direct page register and bank registers for maximum compatibility.)

The biggest reason I can think of is: do you really want to take the risk of your 6502 code accidentally changing the size of the accumulator/index registers? That would surely mess things up. At least with emulation mode they’re forced set.

I find emulation mode to be quite handy: not only can it run 6502 code, but one gets access to addressing modes which can use the full 24 bit address space. Data access is easy, code access just a little bit tricky - one must take into account how to return from an interrupt. And in emulation mode, the '816 is not too tricky to use in an '02 system.

So what they offered was something really rather compatible with 65c02 and with extra features. Backward compatibility can be a bit of a horrorshow but it does help in the market.

But more to the point, perhaps, WDC needed to sell into Apple for the IIGS, and I believe they needed a very cycle-accurate 6502-like mode to operate the disk drive. Having said which, I don’t know how the IIGS deals with native vs emulation mode.

Note that the interrupts in native mode work differently and use different vectors - to use original '02 ROMs with '02 interrupt handlers, you’d need emulation mode.

Edit: some relevant threads over on the 6502 forum

Page 9 of the datasheet calls it “W65C02 emulation mode” so it is quite surprising that it actually has NMOS timing (see 6502.org: Tutorial: 65C816 Opcodes). Maybe they were trying to make it emulate the NMOS version and simply allowed the new opcodes the 65c02 has to work.

It doesn’t seem wise to rely too much on exact timing, but I’m sure some software relies on it anyway, especially if it’s know that every machine has the same hardware, which many of these older machines do. Have failures of this kind been observed in machines using the 65xx series?

Yes indeed, the '816 needed to undo some of the cycle-count improvements of the ‘C02, in order for Apple’s software-timed disk interface to work. Later Apple created the IWM chip, the Integrated Woz Machine, to do this work for Macintosh. Woz’ original routines were of course written for the NMOS 6502.

It’s not unusual for games, and modern demos, to rely on precise cycle timings and memory access patterns, and it becomes a test of the fidelity of emulators and recreations. I think this is true for Commodore, Apple, Atari and Acorn 6502-based systems. You can run most software without such accuracy, but to run all software, you need enormous attention to detail.

WDC would not have been concerned too much about all that, but Apple was an utterly crucial customer at the time.

1 Like

Another difference between the 'C02 and the '816 is the hardware vectors.

The '816 has more and these can interfere with C02 code’s expectations. In particular the BRK and IRQ vectors are separate in the '816 while they share the same address in the C02. (and 02).

It’s probably not an issue for most though, however I wrote my own Acorn compatible OS for the 'C02 - compatible enough to run a few Acornsoft ROMs unchanged (e.g. BBC Basic, BCPL, Comal) but on the '816 I had to change some of the software vector locations as they clashed with the hardware ones which resulted in my having to patch the ROM images to make them work.

So emulation mode there was beneficial for me, at least…

-Gordon

1 Like

The IIc (65c02) and the '816-equipped IIGS used the IWM (aka ‘Liron’), so the timing difference should have gone away.

Hmm, interesting. One thing which isn’t obvious at first glance is that WDC is a truly tiny outfit. Really just one man’s dream, everything done on the cheap. As such, I believe they struggle to make revisions or bugfixes. So any development is minimal, and late. It’s a company quite unlike any other semiconductor company you’ve heard of, with an outsized reputation in 6502-centric circles.

I have remembered another quirk of trying to run “old” 'C02 code in '816 8-bit mode and that’s the TXS instruction. The high byte of the X register, hidden from the system in 8-bit mode is also transferred into the 16-bit stack register…

This comment is the key:

The width of the accumulator is based on the m flag, and the width of the X and Y registers is based on the x flag, but the S register is always considered 16 bits wide. However, when the e flag is 1, SH is forced to $01, so in effect, TXS is an 8-bit transfer in this case since XL is transferred to SL and SH remains $01. Note that when the e flag is 0 and the x flag is 1 (i.e. 8-bit native mode), that XH is forced to zero, so after a TXS, SH will be $00, rather than $01. This is an important difference that must be accounted for if you want to run emulation mode code in (8-bit) native mode.

-Gordon

Couple of relevant resources from enthusiasts
The 65816 microprocessor: Common Misunderstandings, Plus Attractions with lots of links
65C816 Opcodes by Bruce Clark with lots of corner cases and unexpected details

But also, to address the topic, I would say that an 816 without emulation mode would surely be a reasonable microprocessor, but adding in emulation mode surely makes it much more attractive in the market as an upgrade. It’s not just that it’s technically an upgrade in an existing socket, but also that a person’s or organisation’s existing experience is still valuable.

1 Like

As I recall, this is part of the origin story of ARM; when Sophie Wilson and Steve Furber (if I have my players right?) visited WDC, they basically said, “wait, we can do that!”, and thus ARM was born.

2 Likes

So we can add “so we don’t accidentally lose track of the stack” to the list of reasons to use emulation mode. That will almost surely crash.

I don’t know if TXS behaving this way is an oversight or by design but it does seem unusual.

1 Like

I think it sort of makes sense - in emulation mode it does what you expect, but in native mode it does what it expects - which is to transfer the X register to the stack pointer register - although when in 8-bit native mode it still transfers all 16-bits…

That caught me out when trying to be clever and run my entire '816 SBC in native mode - any software (like most Basics) that regularly fiddles with the stack were breaking. Sometimes setting X to $0100 in 16-bit more would make them run in 8-bit native mode, but it was a fiddle.

-Gordon

That was good old days when one could craft a cpu or a cpu support chips, with only a few people
and low start up costs,as the tech was ample to do the job for a reasonable price per chip

They could have had TXS examine only x and not e. That appears to be what TXY/TYX do.

Or, maybe not. What isn’t clear to me at this point is whether any instructions are doing more work than they maybe have to. For example, do TXY/TYX actually examine x and then decide how many bytes to transfer, or do they always attempt to transfer all 16 bits but just fail to transfer the high byte when it’s being forced to zero? Maybe - the internal special bus is 16 bit but I guess we won’t know until the whole thing is reverse engineered. There could be a lot of things like that related to emulation mode and the mx bits.

The 816 was in development before the IWM shipped, but even then the compatibility requirement for those drives and controllers would’ve remained because of the existing installed base. Remember it was supposed to ship with the Apple IIx in 1984 but the 816’s delays were one reason that project was canned.

The later IIe has 65C02

There are basically no bugs in the 65C816. In fact I believe the only bugs were the timing constraint bugs on REP and SEP that limited the very first revs to 3.5MHz.

Even the 1 clock timing shift on the 65C02 was considered a bug by WDC however.

The emulation mode and being able to run C02 code in 816 mode both have essential reasons to exist (especially when you remember there was once a near drop in part without the extra addressing features)

  • 816 and 8bit is needed to run legacy code within a newer environment (eg IIgs)
  • C02 emulation mode is needed to run unmodified existing code because of things like interrupt stacking and extra vectors.

It also turns out that it’s really useful as a sneaky way to keep the gate count down and page out a boot ROM without an extra GPIO 8)

What does that affect?

That’s the extra clock for crossing page boundaries.

That’s unfortunate it’s seen that way - it’s perfectly reasonable to expect adding 16-bit addresses to be harder than adding 8-bit addresses.

Of course a processor with fewer cycle penalties is easier to understand, at least from a programming point of view, and if there is a way to design it to elimate the penalty, maybe it’s worth it. For that reason it’s nice the 65c816 eliminated the decimal mode cycle penalty from ADC/SBC.