WAI instruction behavior

I believe this explanation has problems:

How would it best be fixed? Maybe we need an eratta.

Mod edit: to include this snippet from that book…

Description: The processor stops until an external interrupt occurs. You can use this feature to reduce interrupt latency by putting the WAI instruction at the beginning of the interrupt service routine and setting the interrupt disable bit I. When an interrupt occurs, the instruction after the WAI will be executed.

Having built and written a lot of code for the '816 including extensive use of the WAI instruction, I’m fairly sure the explanation is technically correct.

How do you think it could be fixed?

-Gordon

Being a person who hasn’t used the WAI instruction much or the `816 at all, that description is a little confusing.

If the WAI instruction is in the ISR, doesn’t that mean we have to wait for at least one interrupt before we can WAIt for the next one? Doesn’t that mean the first interrupt source won’t be serviced?

1 Like

Hm. Yea, maybe confusion.

What I think it means in this context is that you JMP to the ISR routine, then disable interrupts then WAI. You don’t RTI.

Rather than just randomly wait for an interrupt.

This only works in certain systems though - what one other OS might call a “tickless” system or if you really need a fast response to an interrupt coming in. (7 cycles - can you poll faster? who knows)

I never used it like that - I had another reason to use it which was to invoke a co-processor, so the RDY output pin was handy to me.

There are race conditions surrounding it - things you can/should be able to work out if you think and read as much as you can on the '816 (as I did), but I don’t think they’re obvious - e.g. if an interrupt comes in while the WAI is setting up then it will cancel the WAI and carry on - that may be fine for you but it wasn’t for me. NMIs will cancel the WAI too.

Personally I’d suggest you avoid the '816 and stick to the 02, but maybe you have a reason to use it. My reason was to scratch that “what if…” itch. Which I’ve done and implemented a multi-tasking OS that runs BCPL with it. I’ve found there was a reason (many reasons) why it wasn’t that popular back then with only 3 commercial systems using it to my knowledge (Acorn communicator, Apple IIgs and the SNES) but again, that’s my own personal reasons.

-Gordon

1 Like

I do (now) see the potential confusion. It’s not exactly an ISR, but it is very like one. Edit: or rather, it’s not the ISR but it is an ISR.

Is this not like the wait on the 6509 err the 6809. The TRS 80 color computer used it I think for
disk I/O dma loop
back: wait;
ld a [disk];
st a x+,
bra back;
Might wait also be used for task switching on a slave cpu, where you sleep
until needed.
Ben.

It could be used like that - if there was only one source of interrupts for example. For multiple sources you’d need to poll the sources to see which one fired as usual. the CPU is using less power as it’s effectively halted so that’s one reason to use it and from what I gather the 7-cycle interrupt sequence is avoided and the CPU starts on the very next cycle after the interrupt so can effectively respond faster.

Another feature is that executing WAI sets the RDY signal low so external “something” can detect this - “slave cpu” for example, but this isn’t without it’s issues, sadly, as I’ve found when using it.

-Gordon

Feels to me like WAI works - or has benefit - when the alternative would have been a tight loop: test and branch back. It’ll be faster than such a loop, but it fits into a program in the same way. You’d be saving a few cycles of latency.

I’m also reminded of how video synchronization worked on the VCS/Atari 2600 (with the processor clock under external control to implement wait), or on the Atari 8-bits (with the 6502C “Sally” and its additional halt pin used in a similar manner). It feels to me much like an attempt to implement these features more “cleanly” and without special purpose hardware.

Now, if you want to wait for the next raster line, just put the CPU into wait by a WAI instruction and await an interrupt fired by the video hardware. Similarly, this should be useful everywhere we want to synchronize with an external device, like waiting for it to finish a given task or waiting for an instrument to send the next data point, etc.

Hmmm … it makes me wonder which would be more useful:

  1. a WAI that pushes PC and processor status, then high-Zs until an interrupt occurs, which can vector and proceed like a conventional IRQ/RTI, but without the initial stack activity, so lower latency in most cases.
  2. a WAI that simply high-Zs until an interrupt occurs and proceeds at the next instruction after the WAI, with no stack activity at all.

It would be nice to have both available, but what to name them (in a strange world in which I have total control over a design) … I think WAI for option 1 and HLT for option 2 (in the spirit of HLT on the PDP-8), yes? I was thinking about PAUSE instead of HALT, but how do I shorten PAUSE to a three-letter mnemonic? How about HESitate or SYNchronize?

Would not a branch and then wait for input work.
PC = BRANCH, then WAIT.
Note the whole point of this stuff seems to meet i/o timing useing a cheap standard cpu than a faster version.

In the SNES scene, people giggle at you if you do 16-bit operations in 8-bit modes, especially if you stay in emulation mode.

I do think a code example of WAI would clarify a lot.

Eyes & Lichty seems to be clearer:

https://archive.org/details/0893037893ProgrammingThe65816/page/n548

But, I checked the 65c02 datasheet and WAI supported.

1 Like

So it’s either “get ready to react to the next interrupt immediately” (without having to wait for any currently executed instruction to finish) or, with interrupts disabled (the i flag set), it’s “sleep until the next interrupt, at which we will proceed with the next instruction”.

1 Like

Very well said.

I just noticed another imperfection in Eyes & Lichty too. The pages on SEP/REP say that they are the only ways to modify the x and m flags directly, and that instructions that pull the status register P can also do it indirectly.

I believe PLP is the only opcode that can pull into P, so that sentence shouldn’t be plural.

RTI

01234567890123456789

-Gordon

1 Like

Ah, RTI does also. Well, if those are the only two, “RTI and PLP” is still less typing than “Instructions that” so it kind of sounds like they are wording it this way because they are too lazy to go and check them all. But somehow maybe they know that at the hardware level pulling from the stack is the only other thing connected to that staus reg able to write to it. Perhaps.

I guess, it depends on where you are coming from. What they were really trying to say is, “SEP and REP are the only means of directly setting or resetting the m and x flags. There are no dedicated instructions like the classic SEC or CLC instructions. And, well, BTW, any other mechanism pulling the processor status from the stack will also affect these flags, as should be evident.”
(Otherwise, a reader may form the idea of these flags being masked out, when the P register is retrieved in any other way.)

To their defense, it’s quite a challenge to keep an instruction description short and crisp, while still providing all the information and avoiding potential misinterpretations. Even more so for a multi-mode processor, like the 65816.

In order words, SEX, CLX, SEM, CLM don’t exist.