Virtual 6502 Suite Update

NoLand · May 20, 2021, 7:13am

During the last days, I updated the “Virtual 6502” suite of 6502-related tools. They have been around for a while (they may be even the oldest set of 6502 tools on the web still in existence), so they earned a bit of TLC.

Especially the assembler has been extended for a variety of syntax flavors and may therefore still have some function, especially, when it comes to legacy code. (E.g., it should be fully compatible with the original MOS cross-assembler.) However, it’s still a simple assembler, without any support for macros or complex value expressions. But, when it comes to a quick session with some trivial code of mixed or obscure origin (like bits of old and bits of modern code), there may be still a niche.

All the tools now support file-uploads and input via drag&drop, which may be especially useful with the disassembler (as this supports now binary files as a source).

Do two tools make already a suite? Well, there is also the emulator…

(And, BTW, an 6502 Instruction Set Sheet, which is a bit better known.)

For details and links, see this blog post:

EdS · May 20, 2021, 10:07am

Excellent! I see I mailed you about this back in 2009 - it almost feels like the first time I reached out to someone on the internet. And look where we are now.

NoLand · May 22, 2021, 1:00am

And now with nicer buttons!
And PET ROMs.
(That is, the emulator.)

NoLand · May 26, 2021, 1:47am

Another update to the emulator: there’s now a watchdog to optionally monitor breakpoints and other conditions.

Moreover, mostly on the cosmetic side, all the apps now use nicer custom dialogs instead of the generic ones.

NoLand · May 31, 2021, 7:28pm

An Update to the Documentation of the 6502 Instruction Set:

Something, I wanted to do for quite some time: I finally added – along with some other explanations – a documentation of illegal opcodes.

Illegal opcodes can be now optionally included in the table view (listing the opcodes by instruction code) and there is a new section listing them in detail.
Moreover, there are ~~two~~ update: three more tables in the section regarding the decoding scheme, which illustrate how and where these illegal opcodes fit in and provide deeper insights into the internal operations.

(The mnemonics are generally those used by the ACME and DASM assemblers, but differ in a few instances, where these are ambiguous, i.e. ANC and SBC, for which we use here the alternative labels “ANC2” for $2B and “USBC” for $EB, respectively.)

I think, the documentation is now quite complete…

NoLand · June 5, 2021, 7:18am

And another related blog post, about “illegal” opcodes, what they are and what might be learned from this:

NoLand · June 24, 2021, 8:36am

And yet another round of updates, especially the emulator enjoyed a major rewrite and the assembler has been substantially improved (and supports now complex value expressions). Since there is actually quite a number of features to discuss, here’s a blog post on the subject:

NoLand · June 25, 2021, 7:02pm

And yet another update: The disassembler now supports symbol tables (with support of complex expressions, similar to the assembler, but a bit more permissive regarding formatting.) Symbol tables may be bare lists (as emitted by the assembler), label-style definitions, or assignment.
This way, you may now have perfect round-trips between assembler and disassembler. There are also multiple output formats available, including verbous listings and bare assembler code.
Of course, besides editing them manually, you may load symbol tables / header files from disk or by drag&drop.

The three apps should be now really en par regarding their feature set and supported formats.
The blog post has been updated accordingly, as well.

NoLand · June 27, 2021, 7:45am

And yet another (small) one:
The emulator has now an option to load the ROM of the KIM-1 (main ROM with TTY, keyboard monitor and display only, not the optional cassette extension, which doesn’t make much sense in this context).

And what fun would a KIM-1 be without the original ROR-bug, which haunted the pre-June 1976 production series? Well, we now optionally emulate this as well.

(Officially, the instruction was “not yet implemented“, while in actuality there was a bug in the PLA decoding lines, causing ROR to behave like ASL, but always shifting in zero and preserving the carry-flag as it had been before.)

NoLand · July 5, 2021, 7:00am

As another update the disassembler now supports a few new features regarding symbol tables:

Addresses immediately follwing a declared symbol will be translated to a labeled “+1” address (optionally). This may be useful with any code setting up pointers, etc, where only the low-address is declared as a symbol (as is often the case with zero-page system addresses.)
E.g.

Object code (start address 0x0800):

 A9 01 85 B8 A9 40 85 B9

Symbol table:

 SETPTR   = $0800
 POINTER  = $B8

Disassembly:
                             * = $0800
 0800   A9 01      SETPTR    LDA #$01
 0802   85 B8                STA POINTER
 0804   A9 40                LDA #$40
 0806   85 B9                STA POINTER+1
                             .END

Symbols may be optionally declared specifically for write access only by a “w” suffix (case-insensitive). Think of the ATARI VCS/2600 and TIA read and write registers sharing the same address.
(The assembler ignores such a suffix in order to provide compatibility with such symbol tables.)
E.g.,
```
CXP0FB = $02   ;read register
WSYNC  = $02 w ;write access only
```
Predefined symbol tables for the C64, the PET 2001 and the Atari VCS may be loaded directly from the UI.

If you have symbol tables ready for other popular systems (BBC Micro, Apple ][, etc) and think this may be useful and should be made available, as well, let me know! (For contact details, see mass:werk <contact> .)

NoLand · July 5, 2021, 10:20am

Finally, the disassembler now supports .DATA pseudo instruction for its symbol tables to declare an address or a range of addresses as data to be excluded from the disassembly (".BYTE" pseudo instructions will be added instead.)

E.g.,

.DATA $2040           ;exclude a single address
.DATA $2040 ... $240F ;exclude the range $2040 ... $240F (inclusive)
.DATA $2040, $240F    ;as above
.DATA D1 ... D1+4     ;expressions are allowed

This should be useful for drilling down on some code: just keep adding “.DATA” statements to your symbol table as you identify a data section and keep on disassembling.

By this, I think, the disassembler is somewhat complete and final. (Well,… )

NoLand · July 5, 2021, 12:05pm

Never say never, etc, @EdS has contributed a source for a symbol table for the BCC Micro/Acorn OS ABI.

Circumstantially, the apps now support ‘&…’ hex notation to the comfort of our poor friends who were tragically mislead by the BBC (while MOS made it quite clear that is ‘$…’).

scruss · July 5, 2021, 4:33pm

I’m not sure that’s the BBC’s fault:

Acorn’s Atom BASIC, produced without BBC involvement, uses # and & to convert between decimal and hex. Like BBC BASIC, the Atom’s interpreter included an assembler, but was a very idiosyncratic language. The sense of the Atom’s & operator is quite the opposite from what one might expect, though, since it converted decimal to hex.
The outline specification for BBC BASIC requested requested syntax compatibility with MS BASIC 5.0. MS BASIC (at least BASIC-80 5.21) used & to denote octal constants and &h for hex. Earlier MS variants for the 6502 mostly didn’t, though AppleSoft supported $ notation.

Some of Atom BASIC’s weirdnesses made it through to BBC BASIC, but mostly compatibility prevailed.

NoLand · July 5, 2021, 5:13pm

It’s really a bit confusing – at least to me.
(I guess, Sophie is to blame – and the BBC BASIC specifications.)

Previously, I had “&…” for octal notation, because I faintly remembered this being used by some software roughly from this period. (Some BASIC? A 6502 assembler?) However, MOS assemblers used “@…” for octal and I really couldn’t find a reference for “&” used for this by any assembler for the 6502 family.

I hope, it’s unambiguous.

P.S.: As I never managed to obtain a Beeb and thus never became intimate with its ways, if there are any idiosyncrasies of note that a compatible assembler should be aware of, any advise is welcome!

NoLand · July 6, 2021, 10:41am

So, as a first step, the assembler gained a special “BBC compatibility mode” (as well as a dedicated checkbox to activate this).

This switches some of the syntax and provide the following features:

BBC-style listing with “&…” hex notation
Generally “P%” may be used for the program counter, anywhere “*” may occur.
Symbols may (optionally) end with “%” (BASIC style).
Statement labels start with a dot.
Bare pragmas without a leading dot.
Assembly blocks using “[ … ]”.
Multiple statements per line using “:” as separator.
Comments with backslash ("\…").
Enhanced features, like grouping expressions with “[ … ]”, will be still available.
Mind that expressions are still evaluated without precedence (I actually do not know how the Beeb does it), on the other hand, you may still use grouping with “[…]” inside expressions. Generally, mixed syntax is allowed to a certain extent. (So the syntax is somewhat loose.)

E.g, this should compile as intended:

\FILL A PAGE WITH BYTES,
\PRESERVE PROGRAM

P% = &800
[
.START
       LDX #OFFSET
.LOOP  TXA
       STA START,X
       INX:BNE LOOP
       BRK
TEXT 'FOO BAR

\INSERT BYTES HERE
OFFSET = P%-START
]

(On a real BBC micro, you probably have no comments outside […], but this is – mumble – a dedicated assembler.)

The disassembler will follow next…

EdS · July 6, 2021, 4:42pm

That’s great! I think it might be tricky in general to get very close to Acorn idiom, but it would have the great advantage that your site becomes a crib sheet for how to code assembly within BBC Basic. Here’s your example, mildly and somewhat clumsily reworked, in the Owlet IDE.

NoLand · July 6, 2021, 6:10pm

Ha, I see a loose end: reassigning symbols (apart from “P%”, this throws an error, as it is now).
This is never going to end…

Regarding “a crib sheet for how to code assembly within BBC Basic”, there’s already Owlet, as you have demonstrated quite well. So it may be more a “kludge sheet”…

Edit: Oops! Reassigning symbols will require 3 passes, while we do just 2. Hm…

NoLand · July 8, 2021, 3:48pm

{phew}! – we achieved somewhat feature complete BBC Micro support:

Most intricacies are respected and supported by the assembler
Support for REM, etc
Support for BASIC indirection operators outside assembler blocks ("?", “!”, “$” in patterns like “?<expression> = <expression>” or “<identifier>?n = <expression>”)
The disassembler generates likewise code, including BASIC statements with indirection for data insertion ("?P% = &xx" or “P%?n = &xx”).
Full round trip between apps guaranteed (i.e., the assembler accepts output generated by the disassembler and the emulator runs code generated by the assembler).
However, still case-insensitive outside strings and we do not check for conflicts with BASIC keywords.

Update:
The assembler supports “LEN($P%)”, which is, while a very special use of a BASIC function, kind of important for BBC BASIC indirections. ~~(However, as it is currently, you can not use $P% on the right-hand side of an assignment and there are no BASIC string concatenations implemented. So no support for things like $P% + $P$+" TEST".)~~ Solved! You may use $P% on the right-hand side, as well. Things like ‘$P%="WORLD":$P%="HELLO "+$P%:P%=P%+LEN($P%)’ should work as expected. (Including edge cases, where code has been inserted in the meantime.)

E.g.,

The rather prosaic source code “1 2 3 4 5”

produces the following verbose disassembly (at start address 0x4000):

               P% = &4000
               [
4000   01 02         ORA (&02,X)
4002   03            ???
4003   04            ???
4004   05 00         ORA &00
               ]

and the following “assembler code” (only) output, which may be copied as-is into the assembler:

P% = &4000
[
          ORA (&02,X)
]
?P%  = &03 :REM 00000011
P%?1 = &04 :REM 00000100
P% = P%+2
[
          ORA &00
]

which in turn produces “01 02 03 04 05 00” (the extra “00” is owed to the operand of the last “ORA”)
with the following listing:

pass 1

LINE  LOC          LABEL     CARD

   1  4000                   P% = &4000
   2                         [
   3  4000         .TEST     ORA (&02,X)
   4                         ]
   5                         ?P% = &03
   5                         REM '00000011'
   6                         ?1 = &04
   6                         REM '00000100'
   7  4004                   P% = P%+2
   8                         [
   9  4004                   ORA &00
  10                         ]

symbols
 TEST       &4000

pass 2

LOC   CODE         LABEL     INSTRUCTION

4000                         P% = &4000
4000  01 02        .TEST     ORA (&02,X)
4002  03                     ?&4002 = &03
4003  04                     ?&4003 = &04
4004                         P% = &4004
4004  05 00                  ORA &00

done (code: 4000..4005).

P.S.: The assembler now also supports niceties like a heading comment that will be used as a page header together with the “.PAGE” (BBC mode: “PAGE”) pseudo-instruction.

NoLand · July 8, 2021, 11:03pm

Some of the neater tricks with BBC BASIC embedded assembler syntax:
If we take the above source code and exchange the second indirection statement by something fancier (mind the expression for the address, this could be a simple number or a variable name, as well), we may assemble out of line:

...
?P%  = &03
?(&1000*4+16) = 2+2  \<--- put 0x04 into 0x4010
P% = P%+1
...

which provides the respective assembler listing for these source lines:

LOC   CODE         LABEL     INSTRUCTION

...
4002  03                     ?&4002 = &03
4010  04                     ?&4010 = &04
4003                         P% = &4003
...

and generates the following code, empty spaces filled BBC Micro-style with 0xFF (“…” filled by other code):

4000: .. .. 03 .. .. FF FF FF
4008: FF FF FF FF FF FF FF FF
4010: 04

Which is literally quite extraordinary.

Especially, if we consider that these lines are actually a break-out to BBC BASIC with all its capabilities (on the real thing, which is here not reproduced to the full extent).

(Well, I’ve learned a few things thanks to this project.)

For more on this, see BBC Micro User Guide: Assembly Language

NoLand · July 9, 2021, 8:36am

Here’s a final question – sort of a philosophical one:
By now, we have a real Frankenstein assembler, supporting a mix of quite a variety of syntaxes. As it is now, I made BBC mode a bit stricter than it has been before and reserved most of its features to it. In actuality, there is no real need for this. Some of the syntax errors, the assembler throws, are just thrown so that it feels right.

The sole conflicting feature is statement labels starting with a dot and pragmas without a dot in BBC mode, while in standard/MOS mode, it’s the other way round. (Even this is not a given source of errors, since many assemblers use dot-labels for local context.)

So: Should we allow the entire mix, or keep the various flavors of syntax somewhat separated?
When does it start to become confusing?
(E.g., if we allow “[…]” blocks in standard mode, will a user, who doesn’t care about BBC syntax, be utterly confused, if this doesn’t throw an error? Or, what about those indirections? How could something like ‘$100 = "FOO"’ be a valid statement? On the other hand, how would BBC-mode users feel about the “*” symbol for the program counter or “$…” hex notation, esp., if we consider that “$” is an indirection operator for strings in BBC BASIC. Or what about BASIC variable assignments directly in the assembler blocks?)

Heck, we could even incorporate some more Intel-like features, like numeric notations like 1234H and 1010B (not a good idea, because of ambiguity with labels like “AAH”). – When does loose start to become ugly, in terms of syntax?

Comments are welcome!