Tom Harte's (et al) JSON CPU Tests

This may be of interest to anyone writing an emulator: A number of test suites for various processors, consisting of JSON files per instruction code with various test samples, including, code, start and end states, cycles and memory involved in the operation.

(See also the links at the bottom of that page for further tests.)

VIa this article, advertising a new 8088 test suite, gained from running an Arduino against a real 8088 for hardware validation:

Via HN,

1 Like

Edit: I see this is more than usually thorough!

That’s all you need to validate the functional aspects of a CPU instruction - if you’re not interested in emulating a cycle-accurate 8088 CPU, you can stop there. But if you want that extra accuracy, you can parse the ‘cycles’ array and get a cycle-by-cycle readout of the CPU’s status and bus lines, comparing your emulator to how the real thing performed. Did your reads and writes occur on the same cycle as hardware? Is your prefetch up to snuff? Now you can know for sure.

It’s a funny thing, that it’s easy to check a validation suite against real hardware, and in some sense it tells you that it’s right. But on the other hand, it doesn’t tell you how likely it is to find the sorts of errors that people make, or indeed what the formal level of coverage is. Emulators, in particular, can fail in ways which hardware is unlikely to, because of the different substrate. Probably vice-versa, too. And yet, for example, Klaus Dormann’s suite for 6502 emulators has also helped one or two people debug their actual systems - presumably because it tests situations not usually encountered.

Interesting read, indeed.

True enough, this is how I began writing my emulator. Having went through a lot with it I can’t stress enough how vital are such tests. To this date I have over 1000 unit tests that test mainly for correct operation of each instruction with all possible combinations and situations I was able to think of, but also for correct assembly translation, memory states and so on.

Having them in place helped me numerous times when I’d refactor something and broke discrete functionality that wasn’t apparent and I didn’t immediately thought I would break.
Instructions aren’t the only things I am testing, basically I test every function I create even if it’s a utility function, IO, network or general debug.

What I do lack is testing for cycle timings. And that’s somewhat by design. While I did design the machine to emulate a 30 Mhz CPU, I cannot guarantee that the instructions are executed to the microsecond in a specified frame of time. However, given the low frequency of the CPU and the fact that I am exposing the machine time to the user with a sufficient granularity I am thereby encouraging a way of programming that is agnostic to timing-by-instruction.

And, since I designed a fantasy computer/CPU which is entitled to its own particular quirks, I might just get away with it. :slight_smile:

This is what I love to see, all greens.

1 Like

Given modern operating systems (and web browsers) and their aggressive rescheduling of tasks (in order to compress them into tightly packed frames), this is a somewhat illusionary premise. :slight_smile:

(E.g., if your emulator fails in a web browser, you may see a dozen or so animitationFrames canceled, while these are only requested at the end of such a frame. Meaning, frames are massively executed in advance, only to be thrown away, when there is any interaction. Or, if you have a background task rendering sound, you may observe this drifting off, while a demo runs for half a minute without any interaction. On a modern system, time is much an illusion, as far as any specific task is concerned. Very likely, you can only guarantee granularity internally to your emulation, but not to the outside. Moreover, there may be an even absurd mismatch in granularity: if you are emulating an 8-bit machine, your emulation will probably processing input faster than the host OS does via the USB driver chain.)

Indeed. Operating systems or “wrappers” of any kind (read browsers) tend to get in the way of a propper emulator. Initially I had the naive thinking that maybe I can get over this by writing an exclusive barebone OS for x86/64, Raspberry Pi, others… Or write Continuum as an exclusive OS directly. I quickly ran away from that when realizing what it really entails. Probably only writing an minimal USB driver would have taken me a year if not more. Plus I needed networking also, otherwise the debugger could not have existed. Not to mention process handlers and so on.
Again, my job is way easier and I realize that. Whoever tries to write an emulator of an actual hardware faces strong challenges.
But hey, who knows. Maybe in the future a community will start writing barebone and really take control of hardware, invent new things and maybe determine CPU manufacturers to obfuscate their instructions so they only work with licensed developers…

But, getting too far ahead. :slight_smile:

There are a couple of senses in which an emulation can be speed accurate, I think. One is about internal consistency: if the timers and counters, and the video output, and the sound effects, all match up with the instructions being executed on the CPU, then the emulated software will behave as it would on a real machine, even if it’s real-time (wall-clock time) progress isn’t quite accurate, or is even slightly variable. But the other sense is more strict: that a 50Hz or 60Hz frame rate is accurately maintained, and the pixel clock is accurate such that beam-racing software will act as the original.

I think both are possible, provided there’s enough care and ingenuity, and provided the host platform has the excess performance to deliver the accuracy. For example, jsbeeb is, I think, accurate in the looser sense and pretty good in the stricter sense… although I might be wrong because I didn’t know about frame dropping. And emulating a 50Hz frame rate on a modern display which isn’t 50Hz is bound to result in some kind of mismatch.

But not worrying too much about this level of exactness is a fine choice, especially for a platform which doesn’t already have a body of realtime software (aka video games) written for it. It’s not uncommon, I think, to batch up 1000 instructions, or similar, and then wait for the real milliseconds to elapse, to approximate good time keeping.

For a tangible example, take my PET online emulator: I’m pretty happy with internal consistency (the memory access of the CPU T-phases are consistent with screen memory access – thus it runs the newest “racing the PETSCII character” high-res demos and even shows memory interference on the screen, so this is good enough for me :slight_smile: ), but externally – I got actually frightened. here’s the story:

As you may know, the PET hasn’t a sound-chip or even a speaker, but you can set a timer in “free running” mode, which will shift a bit pattern to an exposed output pin (on the user port). This will produce pulse sound at 1Mhz. No problem with down-sampling this with internal consistency on a modern machine. So we end up with samples in high-res digital audio at 48 KHz, which we must provide to a playback callback of the browser.

Fine, So far, so good. The way to do this is to write the samples to a ring buffer and to exhibit a portion of this to the audio callback. So, how long should that ring buffer be? One PET frame samples down to 800 modern audio samples, so 2K to be on the safe side? I wrote and tested this first on an older (actually now rather old) machine and an older OS, because, if it works there, it will work everywhere, right? It worked well, so I moved eventually to a new machine, with a current OS and a current browser – and the ring buffer immediately ran into a tail crash. Well, funny. — So, double the buffer size? Triple it? — With 32K it still ran into crashes with a demo (midways into “Back To The PET”). We may keep in mind that this is a demo running on its own deterministically, without any user interaction. It turns out, we need 64K to avoid any crashes.

Mind that 32 x 1024 / 48,000 = 0.68…! At this point, the internal scheduling was more than half a second ahead! I checked what other emulators were doing, e.g., Beeb.js has the same buffer size, probably for a reason.
This really changed, how I think about these things.

PS: Resistance is futile, the Borgs have already won. :wink:

1 Like

So, another horror story: Said emulator has a paste functionality. This employs what is called whenever there is need to type something programmatically into the BASIC editor, which I call “auto-type” mode.
Basically, we explode the string into character codes and write them into a queue, then, just before vertical blank, where the keyboard scanning routine of the emulated PET is initiated by an interrupt, a routine is called, which

  1. Checks if the state is auto-type mode and there is actually a queue of characters to type
  2. It sets the keyboard matrix to the next character and counts down from an initial value of 3
  3. On the call after next, as we reach 1, we reset the keyboard matrix to all keys up
  4. On zero, we check if there is more to type, and, if there is, we continue at (2).

Notably, the counter is only advanced, if there is a blinking cursor, indicating that the emulated PET is ready to receive input. This generally works well.

However, rarely, every now and then, a character is missed, but only, if this was initiated by an on-paste callback of the browser. This should never happen!
(A key is pressed and scanned for 2 full PET frames and then all keys are up for an intermediate frame. There is no way that a character could be missed. Even keyboard debouncing can’t be an issue, as this is, as far as the emulated PET is concerned, an atomic operation.)
I had this just happening twice, and this can only be caused by some kind of scheduling mismatch.

I really preferred, when computers were deterministic.

Without looking at the code, it sounds like there is a frame of time when the next queued character is being processed but ignored by the emulator without a mechanism to prevent that.

Also, how can you be sure this does not happen anyway when typing something really-really fast?

The input queue is assembled in advance from the supplied text string, any “auto-typing” happens just before V-BLANK and shifts a character at a time from this queue. There is no human interaction involved (the keyboard input of the emulator is actually locked for these operations) and the execution of the call is atomic to the emulator. There’s no external timing or input involved. Notably, any “auto-type” step is triggered only, if there is a flag set in the emulated PET for a blinking cursor, meaning it is ready to accept input.

(Meanwhile, I added a delay for another frame to this. This slows down the entire operations somewhat, but will hopefully remedy the issue.)

If interesed, here’s the code:

// called at V-BLANK, key codes in array autoTypeQueue
// autoTypeCntr starts at 4
// sync-action is only called, if autoTypeCntr > 0

function autoTypeSyncAction() {
   if (autoTypeCntr === 2) { 
      //release pressed key & sync matrix
   else if (autoTypeCntr === 1) {
      // check, if there's anything left to type
      if (autoTypeQueue.length === 0) { // no
         autoTypeCntr = 0;
         // reset state
         // execute callback, if any
      else if (controller.isInteractive()) { // PET in interactive state
         var key = autoTypeQueue.shift();
         if (key) {
            // apply the key and sync matrix
         autoTypeCntr = autoTypeDelay; //delay is 4
   else autoTypeCntr--;
1 Like

Ah, I see, the emulator registers a keypress by comparing a previous keyboard state with a current state so ‘KeyTyped’ would mean prev=pressed & current=released which is what this JavaScript method seems to do.
The code looks fine, but maybe there is a problem with the emulator registering the correct states. As such, a delay is a good idea between the press and the release. I think this is also what you mentioned earlier.

But in any case, debugging is a very good idea next. :slight_smile:

Notably, this is only used for inserting pasted text and things like automatically loading and starting a program, where we want to transfer an entire string of arbitrary length to the PET in order to execute it!
(I had only rare issues in the context of pasting.)

Normal keyboard handling is an entire different affair (this uses the same methods for applying a key and syncing the keyboard matrix, though).

So, we start with a string, split this into a queue of key-codes to apply, which is then executed by this method in individual steps over individual frames (at least 4 per key to press). There should be no room for any context that may interfere and the code should be executed deterministically.
The only thing, I can think of, is some lazy execution on the browser’s side.

This will result in entirely different timing, though. There’s actually no way to debug this.

You don’t need to output the logging data in realtime, just collect it in a memory buffer so it doesn’t take a performance toll and dump it at a later time to a file or console.

1 Like

Effectively, it’s the emulated PET, which pulls the next key from the queue, and this only, when it is ready for input, just before the interrupt that triggers the scan routine fires. It is then exposed to this key-press for several frames and intermediate frame of no key pressed. I can’t see, how any key may be dropped from the queue. There are notably no timeouts involved, it’s all code and counters.
I’ve a hard time seeing anything else than nondeterministic behavior, especially, as the rare lapses seem to occur stochastically.

PS: For the PET-side of things, see <Now Go Bang!> PET Keys — Series 2001 Edition

PS: Not mentioned in this writeup: The PET features a keyboard input buffer for 10 characters.
This means for any loads:

  1. At the time of typing, there is no other task for the PET than putting the next character from the buffer onto the screen and advancing the cursor. (This should be easily accomplished in less than a frame. :wink: In actuality, it has several frames to do so.)
  2. Thanks to the intermediate frame exposing an empty keyboard matrix, there is no chance for any collisions to occur (this is needed only for separating equal character codes, anyways.)
  3. If a line is submitted by RETURN, (a) the auto-type routine goes into an extra long delay (mostly for aesthetic reasons) and (b) the PET sets its cursor-on flag to off while processing the input, thus blocking the auto-type routine. There should be no chance for the PET’s keyboard buffer to ever fill completely.
  4. The emulated PET’s 60Hz V-BLANK interrupt that triggers the keyboard scan routine and the auto-type routine are initated by the same counter (the latter just before the former).
  5. There is no async code and there are no Promisses used in the emulation. This should be strictly single-threaded and synchronous.

Provided that everything is preprocessed in advance, and that the PET executes deterministically and single-threaded, and that the auto-type routine patches in at the very same moment of any frame execution with what is to the PET constant timing, there is no sensible reason, why pasting the same text once misses a character and, then, repeating the same procedure with the same clipboard content, succeeds.

1 Like

Just a short update on the matter: With the issue reoccurring every now and then, but at the same time not being reproducible (pasting the same clipboard content on a failure would usually work ok and as expected, but after some repeated procedures, another failure on a completely different character would occur), while the internal behavior should be by all means deterministic, I gave up on this scheme. So, instead of operating the emulated keyboard (which I would have preferred, since it would have been agnostic of any internals, which at the same time, wasn’t true anymore, as we were listening into internal state more and more to make sure that this wasn’t caused by any overflow of the keyboard buffer, etc), I’m now injecting any pasted text directly into the keyboard of the emulated machine (a character a frame, provided the emulated PET is accepting input and there’s room in the keyboard buffer). This works reliably and is also much faster.

I still have no idea what could have caused the issue. Notably, the frame-based trigger and the checks are still the same. Also no idea, why and how this only occurred in the context of paste operations and not with other cases where the emulator types in the emulated PET (which is more often about shorter texts, though). – If you’re writing a browser-based emulator, hic sunt dracones.

Moreover, as we’re at it, the emulator now also features, besides the (in)famous chiclet/graphics keyboard, the business keyboard and related ROM sets, being thus a “CBM Series 2001/B Business Computer”, as well. :slight_smile: