Reading the 6502 Break Mark – And, how fast was the 6502 back in the day?

NoLand · May 28, 2022, 5:43pm

The “break mark” is a somewhat underexposed feature of the 6502 and many actually don’t know about it, at all. So, for a recap, the BRK instruction is a 2-byte format, while it has no operand, and the return address on the stack skips the immediate next byte, which is reserved for the break mark. This is meant to provide a means of distinguishing multiple sources of breaks.

Well, but there is a problem: namely, this makes only sense if there are multiple sources of interrupts and we have distinguish between them. For this, we need to inspect the status register for the break flag, to determine, if this is a software interrupt and if the break mark applies at all. ~~However, we can’t simply pull the status register from the stack, as this automatically clears the break flag in the course of the procedure.~~ So we have to work around this, by reading the stack “manually”…

Which gives us a procedure like this:

1000   48        EVALBRK    PHA              ;(3) save A
1001   8A                   TXA              ;(2) X into A
1002   48                   PHA              ;(3) save X
1003   BA                   TSX              ;(2) SB into X, now rewind...
1004   E8                   INX              ;(2) at freshly pushed X
1005   E8                   INX              ;(2) at A
1006   E8                   INX              ;(2) status, finally
1007   BD 00 01             LDA $0100,X      ;(4) read it
100A   29 10                AND #$10         ;(2) isolate B-flag
100C   D0 03                BNE ISBREAK      ;(3) is it set?
100E   4C 00 20             JMP IRQOTHR      ;    not a BRK, handle others
1011   E8         ISBREAK   INX              ;(2) rewind to PCL
1012   BD 00 01             LDA $0100,X      ;(4) read it
1015   38                   SEC              ;(2) set carry (for SBC)
1016   E9 01                SBC #$01         ;(2) subtract 1
1018   85 BC                STA SCRATCH      ;(3) store addr LO (zeropage)
101A   E8                   INX              ;(2) rewind to PCH
101B   BD 00 01             LDA $0100,X      ;(4) read it
101E   E9 00                SBC #$00         ;(2) decrement, if required
1020   85 BD                STA SCRATCH+1    ;(3) store addr HI
1022   A2 00                LDX #$00         ;(2) clear X
1024   A1 BC                LDA (SCRATCH,X)  ;(6) read break mark

57 cycles in total – and the BRK instruction takes another 7 cycles. That’s 64 cycles until we have the break mark ready for further inspection!

I recall that this was a somewhat prohibitive and costly procedure, even in the 1980s. Either, there is something missing in the instruction set that would have made the break mark a somewhat useful feature (I can’t think of any, but maybe the unused flag in the status register that could have switched the pull behavior for the status register), or – much more probable – the 6502 was considered to be lightning fast and cycles didn’t matter anyway. (Notably, this is a CPU feature that requires 60+ cycles to be used at all!) Provided that a 6502 probably wouldn’t have been running at 1 MHz in a realistic setup in 1975, but at a more moderate pace, this begs the question, “lightning fast in comparison to what?”

Does anyone recall how this felt, when the 6502 was still new?

drogon · May 28, 2022, 6:41pm

My early 6502 experience is the Apple II which didn’t really use a “mark” - on BRK you dropped into the monitor with a dump of the registers, but on the BBC Micro it was different as a BRK was used to indicate a program error code and text message, so BRK, then a single byte with the code, then a zero-terminated text string with the message.

However why can’t you pull the status register from the stack?

the detection was fairly simple - this is the mostly BBC Micro compatible code I have in my Ruby system:

.proc   rubyIrqE
        sta     irqA            ; save A
        pla                     ; Get and save P
        pha
        and     #$10            ; Test for BRK bit
        bne     jRubyBrkE

that’s not many cycles at all before you branch into the BRK handler, or carry on in the IRQ handler. The location irqA is in Zero page which saves a cycle. The Apple II IRQ/BRK handler is almost identical to that point. (Woz used 3 x ASLs rather than an AND #$)

As for how it felt? I don’t recall thinking it was an issue and the BBC Micro used several IRQ sources and had a 100Hz timer running to do things like scan the keyboard and update the sound hardware. It did run at 2Mhz though.

Did any systems actually use some sort of switch on BRK other than the BBC Micro?

-Gordon

NoLand · May 28, 2022, 6:57pm

Well, you caught me there, I was thinking of pulling it into the status register again, which would clear the b-flag. If we pull it into the accumulator, however…

Still, I recall this to be a rather prohibitive procedure (we still have to somehow preserve the current stack and stack pointer and clean up behind us for any other interrupt handlers, meaning we have to transfer the stack pointer and deposit it somewhere in memory in order to restore it later, if we want to read a sequence of bytes from the stack…) Mind that the “fun” actually begins just after we determined the break condition, as we have to read the next two bytes from the stack for the return address and decrement it in order to get the address of the break mark, which we’ll have to load indirectly, in one way or the other.

[Edit] So, in all fairness, using pulls, it’s something like,

PHA             ;(3)
TXA             ;(2)
PHA             ;(3)
TSX             ;(2)
STX TMP         ;(3)
INX             ;(2)
INX             ;(2)
INX             ;(2)
TXS             ;(2)
PLA             ;(3)
SEC             ;(2)
SBC #1          ;(2)
STA SCRATCH     ;(3)
PLA             ;(3)
SBC #0          ;(2)
STA SCRATCH+1   ;(3)
LDX TMP         ;(3)
TXS             ;(2)
LDX #0          ;(2)
LDA (SCRATCH,X) ;(6)

52 cycles total + BRK (7) => 59, and we even haven’t checked the break flag
(If we reorder the code to restore SB before we store the high-address, we could use SCRATCH+1 for TMP.)

NoLand · May 28, 2022, 9:34pm

BTW, regarding cycle counts, I’ve just added an option to generate them to my online-disassembler for this (see the options below the input for an optional symbol table).

(It even “understands” extra cycles for branches to different pages. )

Michael_Barry · May 29, 2022, 8:15am

The NMOS handler code is a bit awkward no matter how you attack the problem, but you can gain some efficiency by assuming that your stack never wraps during normal operation, which should be a safe assumption for any system except WOZMON:

irqhandler:
    pha             ;(3)
    txa             ;(2)
    pha             ;(3)
    tya             ;(2)
    pha             ;(3)
    tsx             ;(2)
    lda $0104,x     ;(4)
    and #$10        ;(2)
    beq notbrk      ;(2*) regular IRQ
    lda #0          ;(2)  BRK handler
    sta ptr         ;(3)
    ldy $0105,x     ;(4)
    cpy #1          ;(2)
    dey             ;(2)
    lda $0106,x     ;(4)
    sbc #0          ;(2)
    sta ptr+1       ;(3)
    lda (ptr),y     ;(5) brk signature byte
    ...


    ...
    pla             ;(4)
    tay             ;(2)
    pla             ;(4)
    tax             ;(2)
    pla             ;(4)
    rti             ;(6)

(code is untested, but should give you the general idea)
[Edit: fixed “off-by-one” coding error]

NoLand · May 29, 2022, 1:07pm

Yes, adding offsets to lda $100,x is a great improvement, as it saves a few inxs. Still, I wonder. We shouldn’t have to jump through hoops like this to access a CPU feature, not on a processor that is know for its simplicity. (The second worst thing on the 6502 is having to prepare the carry flag for addition and subtraction. Arguably, there is some room between this and the procedures to access the break mark.) I still wonder about the rationale behind this.

NoLand · May 29, 2022, 1:52pm

Wow, I just had my first case of bit-rot on a modern system (and this is a machine with ECC memory). A bit flipped and a lower-case “o” flipped to upper-case in a section of the code unaffected by the changes. As an effect, it stopped working with the option for “illegal opcodes” active. – Fixed.

(We learn, you cannot test your code too much, even, if you did nothing to it.)

Michael_Barry · May 29, 2022, 4:03pm

I think the most plausible explanation is that the design team was under considerable pressure in space and time. With limited resources taken into consideration, they did an outstanding job. The 65c02 was a significant improvement, but could have benefited even more with the addition of true stack relative addressing, a feature that didn’t appear until the 65c802.

NoLand · May 29, 2022, 4:48pm

I can see some traces of this (e.g, that ‘zeropage,Y’ adressing is missing for the accumulator, there isn’t even a dedicated slot for this in the instrution layout, there is a full implementation for the X register only and ‘zeropage,Y’ for LDX and STX piggybacks on this), but I fail to see, by what provisions the break mark would become manageable at all, as the access to it is burried on the stack.

The only explanation I could come up with is that the interrupt triggers just before the final increment of PC (I’d need to look up some timing diagrams) and adds this increment on its own before pushing PC to the stack. And for some reason, BRK couldn’t be implemented using what was already around without triggering this increment on its own, already, thus there are two increments and an extra offset in the return address, which, well, becomes explained as the “break mark”. But, even if this was the case, I’d fail to see, why they wouldn’t have been able to implement BRK without this increment, which suggests that this was indeed intentional. Or this was just an oversight that wasn’t worth a redesign.

(Mind that there’s already some incongruence regarding return addresses, where it’s the exact return address with interrupts and the return address -1 with subroutines, so RTS has to add an extra offset in order to transfer control to the appropriate program location and RTI has not. But even this wouldn’t explain the extra offset in the return address of BRK, rather to the contrary.)

oldben · May 29, 2022, 10:14pm

Perhaps the designers never thought people would need to read the break mark.

EdS · May 30, 2022, 6:43am

(I’m away from keyboard otherwise I’d be all over this topic!)

EdS · June 2, 2022, 4:17pm

Not me, I’m afraid!

We might ask two kinds of questions:

what were the MOS designers thinking of and intending when they designed BRK
how did people make use of BRK

and we might also ask

which aspects of 6502 behaviour are carefully and intentionally crafted, and which are more or less unintended consequences of a minimal (cheap) implementation of a simple machine?

Note that the handling of BRK - as a software interrupt - will not (usually?) be time-critical. So, distinguishing BRK from IRQ needs to be cheap, in order that IRQ is never too slow for some peripheral purpose, but decoding the byte after BRK does not need to be quick.

Gordon’s already given his approach, and more or less said what I’m about to, but let me go back to Acorn’s MOS, to this bit in TobyLobster’s excellent cross-referenced and annotated disassembly:

.irqEntryPoint = $dc1c
    STA .interruptAccumulator                           save A
    PLA                                                 read flags
    PHA                                                 store flags again
    AND #%00010000                                      check BRK flag
    BNE .brkRoutine                                     if (BRK flag set) then branch (to
                                                        BRK handler)
    JMP (.vectorIRQ1V)                                  jump to the IRQ1 vector (by default
                                                        this will jump to .irq1Handler)

and we see that just 4 instructions separate BRK from IRQ: PLA, PHA, AND, BNE. That’s the extra cost of BRK handling, from the point of view of cost incurred by IRQ handling.

In a system which can’t afford this cost, perhaps it can afford not to use BRK and avoid it - BRK will appear as an unexpected IRQ, left over after all the usual IRQ sources have been checked.

Note that the initial store of A makes this handler non-reentrant, which is fine (if it’s well-documented!) and can in any case be worked around (LDA .interruptAccumulator, CLI, SEI will I think act as a yield to any pending IRQ.). Preserving A is also the least we can do: no need to worry about X or Y at this stage, and maybe the highest priority IRQ handler doesn’t need both anyway.

In some uses, the handler software might not even care about the signature byte, in which case it needn’t be found. Acorn’s OS uses it as an error code, and needs to retrieve and capture not only the stacked PC but also the currently selected sideways ROM, which together can locate the byte.

There are a few potentially relevant pages in the visual6502 wiki, which I’ll link to because they are only found now at the Wayback Machine:
6502 increment PC control (Note the ONEBYTE input which is an early decode of bytes which lack an operand.)
6502 BRK and B bit
6502 Interrupt Recognition Stages and Tolerances
6502 Timing of Interrupt Handling
650X Schematic Notes
6502 State Machine
6502 Timing States

Another use of BRK, which isn’t quite like a software interrupt, is as a breakpoint, either to be poked into a loaded program to help diagnose it, or to be used as a single-stepping mechanism. In this case, although one might plausibly also use the signature byte as an identifier, the value of the PC already tells the handler which BRK was hit, so the signature byte is likely to be just whatever program byte came next.

(I think writing a zero byte to an EEPROM is easy - no need to erase and reprogram. This might have been a consideration. Although the detection of all-zero in the Instruction Register which is used to set off the interrupt sequence is about the simplest test in NMOS logic.)

As speculation, it feels to me likely that

it took less mechanism to make BRK two bytes
it’s only two bytes in the sense that the PC increment isn’t suppressed
BRK was firstly intended as a breakpoint or an error trap and only secondarily as a kind of software interrupt

I’ll stop there - hope it’s interesting and not too provocative, and sorry this post isn’t more coherent!

NoLand · June 3, 2022, 4:13am

Just a short reply (I’m currently entertaining a headache, but it doesn’t really return the favor of entertaining me):

What’s interesting about this is that the 6502 was designed, when microprocessors finally became successful. Still, it was probably meant more as a micro-controller than as an MPU for a system with an OS (like BASIC). Which is probably important.
If we’re running the bare machine, most of this isn’t dramatic. However, if there is an OS, most of the zero-page is probably occupied by this already. I think, on Commodore 8-bits, there are just two safe zero-page addresses left, which we may have to reserve for any indirect memory address by pointers.
Secondly, this is an interrupt, meaning, there are no safe assumptions for what state the machine may be in and what resources may be in use or not (compared to the practice of reusing buffers, like in a “normal” machine language program.) So, on a CBM machine, saving the accumulator to a zero-page address is already ruled out, we have to safe it on the stack.
Thirdly, this is an interrupt. We do know nothing about its source or nature, until we have inspected the break-flag, in case this may matter for any of the paths. Even, if there’s just a single break in the code, we’ll have to do this for every hardware interrupt, etc. And, what ever amount of total memory there’s attached, we are running on scarce system resources (as for available zero-page addresses.)

On the other hand, if this is a micro-controller application, there’s ample space in the zero-page to be used as an entire page of slow registers. This is a totally different machine!

Still, I think, the procedure for accessing the break mark is quite a luxurious one for a processor, which is known for its simple operations. It really doesn’t fit into the picture.
(Personally think, it was probably more an afterthought, or even an is-it-a-bug-or-a-feature thing. The latter, because it owes its existent merely to an extra increment of PC as stored on the stack, which may have been an oversight, as well.)

You’re right that there is already PC as some sort of an identifier (I’ve thought so, as well.) – What would be the use case for a more general identifier, for classes of break sources? Well, if this is a break point, on older, bigger machines, it was not uncommon to have a facility to select break points, meaning to stop only on certain ones, while ignoring others. This seems to be the most probable use case this for this. But, in this case we had to jump through those hoops, even while running normal operations and not intending to stop on those “silent” break points. Which renders this a bit odd, again.

Regarding the instruction code for BRK, I think this makes totally sense: if you initialize your memory, this will stop any runaway code pretty soon and trigger any installed trap. (Also, on paper tape, “erase” is all bits set / wholes punched, which is also negative zero. And as there is no sensible decoding for any xF opcodes, as in FF, 00 is probably the next best thing.)

– Well, this was not that short… –

EdS · June 3, 2022, 4:07pm

Indeed it was… from Chuck Peddle’s oral history:

I wasn’t trying to make a minicomputer. I was trying to make something that was a logic replacement. And I understood that, and the guys at Intel didn’t understand that, and the guys at Motorola didn’t understand that. I’d taken all the guys that understood it at Motorola to work for me. So nobody understood it, except the customers understood it.

That’s kind of interesting… there is of course a distinction between OS use and application use, where the initial application tends to be Basic. Looks like the Apple II OS uses less than half zero page, but with Basic much of the rest is used - maybe 20 bytes in odd places are free. Acorn’s MOS takes half the zero page, with the rest for the application, and when that’s Basic, it in turn leaves 32 bytes for user application.

Or in general memory: it’s only one byte and one cycle extra to put the A value somewhere other than zero page.

It does seem possible that BRK is a kind of afterthought… although the 6800 has an SWI, and it seems the 6502 instruction set came from a place of trying to minify the 6800 in order to cost-reduce.

I looked for early documentation, and I found Synertek’s… in page 123 of their 1978 catalogue, we see that BRK is described as “BRK COMMAND” and is tabulated as being a single byte. (This makes sense to me: one only changes a single byte to insert a BRK. It just so happens that PC is incremented a second time, so an RTI will land a little further on, and it’s for this reason we get the free signature byte.) In their 1976 Programming manual, which also bears a copyright 1975 by MOS, we see the B bit of the status register P described on page 38 as

The break command flag is … used to determine during an interrupt service sequence whether or not the interrupt was caused by BRK command or by a real interrupt… This bit should be considered to have meaning only during an analysis of a normal interrupt sequence.

Hang on, let me swap to an OCR version as found here…

On page 147 we see the interrupt sequence of the 6502 described with Cycle 2 being Force a BRK instruction.

On page 159 we see “There is also a software technique which allows the user to simulate an interrupt with a microprocessor command, BRK. It is primarily used for causing the microprocessor to go to a halt condition or stop condition during program debugging.”

On page 159 we get a section on “BRK - The Break Command” where we see “The BRK is a single byte instruction and its addressing mode is Implied.” and then

As is indicated, the most typical use for the break instruction is during program debugging. When the user decides that the particular program is not operating correctly, he may decide to patch in the break instruction over some code that already exists and halt the program when it gets to that point. In order to minimize the hardware cost of the break which is applicable only for debugging, the microprocessor makes use of the interrupt vector point to allow the user to trap out that a break has occurred. In order to know whether the vector was fetched in response to an interrupt or in response to a BRK instruction, the B flag is stored on the stack, at stack pointer plus 1, containing a one in the break bit position, indicating the interrupt was caused by a BRK instruction.

and

Once the user has determined that the break is on, a second analysis and correction must be made. It does not operate in a normal manner of holding the program counter pointing at the next location in memory during the BRK instruction. Because of this, the value on the stack for the program counter is at the break instruction plus two. If the break had been patched over an instruction, this is usually of no significant consequence to the user. However, if it is desired to process the next byte after the break instruction, the use of decrement memory instructions in the stack must be used.

Also

An interesting characteristic about the break instruction is that it’s OP CODE is all zero’s (0), therefore, BRK coding can be used to patch fusable link PROMS through a break to an E-ROM routine which inserts patch coding.

Their sample code includes the replacement instruction and a JMP - so it leaves three bytes on the stack. Oops.

The modern view, which is simpler to understand, is to see BRK as a two byte instruction with a signature byte, but these early documents show the thinking of the designers.

NoLand · June 3, 2022, 7:48pm

That’s some great research you did on this! Thank you for looking into all these sources.

Michael_Barry · June 4, 2022, 2:52pm

Ed is the best I know when it comes to sifting through the search engine chaff and digging up appropriate references.

EdS · June 4, 2022, 3:15pm

thanks for the kind words!

What I didn’t quite get around to doing was checking the visual6800 to see how it goes about detecting an interrupt - does it, for example, inject the SWI opcode into the Instruction Register… no, not so far as I can see. The databus is taken straight to the IR. There’s nothing like the 6502’s ONEBYTE early decode.

Cumbayah · June 5, 2022, 6:47am

It is interesting to note that as I understand Michael Steil’s analysis over at Pagetable, IRQ and NMI (and even reset) handling is in fact implemented in terms of BRK internally in the CPU.

EtchedPixels · June 8, 2022, 12:56am

BRK really only became useful on the CMOS part. On the NMOS part BRK and interrupts are a bit of carcrash and simultaneous BRK and interrupt result in the BRK simply forgetting to execute.

EdS · June 8, 2022, 11:19am

Oh, good point, I’d forgotten. There’s certainly stuff there to worry about, or study, depending on your inclination… here’s the visual6502 wiki which links to various other sources and discussions:
6502 BRK and B bit - VisualChips

(Linked to the archived version because the live version is borked.)