During the last days, I updated the “Virtual 6502” suite of 6502-related tools. They have been around for a while (they may be even the oldest set of 6502 tools on the web still in existence), so they earned a bit of TLC.
Especially the assembler has been extended for a variety of syntax flavors and may therefore still have some function, especially, when it comes to legacy code. (E.g., it should be fully compatible with the original MOS cross-assembler.) However, it’s still a simple assembler, without any support for macros or complex value expressions. But, when it comes to a quick session with some trivial code of mixed or obscure origin (like bits of old and bits of modern code), there may be still a niche.
All the tools now support file-uploads and input via drag&drop, which may be especially useful with the disassembler (as this supports now binary files as a source).
Do two tools make already a suite? Well, there is also the emulator…
Excellent! I see I mailed you about this back in 2009 - it almost feels like the first time I reached out to someone on the internet. And look where we are now.
An Update to the Documentation of the 6502 Instruction Set:
Something, I wanted to do for quite some time: I finally added – along with some other explanations – a documentation of illegal opcodes.
Illegal opcodes can be now optionally included in the table view (listing the opcodes by instruction code) and there is a new section listing them in detail.
Moreover, there are twoupdate: three more tables in the section regarding the decoding scheme, which illustrate how and where these illegal opcodes fit in and provide deeper insights into the internal operations.
(The mnemonics are generally those used by the ACME and DASM assemblers, but differ in a few instances, where these are ambiguous, i.e. ANC and SBC, for which we use here the alternative labels “ANC2” for $2B and “USBC” for $EB, respectively.)
And yet another round of updates, especially the emulator enjoyed a major rewrite and the assembler has been substantially improved (and supports now complex value expressions). Since there is actually quite a number of features to discuss, here’s a blog post on the subject:
And yet another update: The disassembler now supports symbol tables (with support of complex expressions, similar to the assembler, but a bit more permissive regarding formatting.) Symbol tables may be bare lists (as emitted by the assembler), label-style definitions, or assignment.
This way, you may now have perfect round-trips between assembler and disassembler. There are also multiple output formats available, including verbous listings and bare assembler code.
Of course, besides editing them manually, you may load symbol tables / header files from disk or by drag&drop.
The three apps should be now really en par regarding their feature set and supported formats.
The blog post has been updated accordingly, as well.
And yet another (small) one:
The emulator has now an option to load the ROM of the KIM-1 (main ROM with TTY, keyboard monitor and display only, not the optional cassette extension, which doesn’t make much sense in this context).
And what fun would a KIM-1 be without the original ROR-bug, which haunted the pre-June 1976 production series? Well, we now optionally emulate this as well.
(Officially, the instruction was “not yet implemented“, while in actuality there was a bug in the PLA decoding lines, causing ROR to behave like ASL, but always shifting in zero and preserving the carry-flag as it had been before.)
As another update the disassembler now supports a few new features regarding symbol tables:
Addresses immediately follwing a declared symbol will be translated to a labeled “+1” address (optionally). This may be useful with any code setting up pointers, etc, where only the low-address is declared as a symbol (as is often the case with zero-page system addresses.)
E.g.
Symbols may be optionally declared specifically for write access only by a “w” suffix (case-insensitive). Think of the ATARI VCS/2600 and TIA read and write registers sharing the same address.
(The assembler ignores such a suffix in order to provide compatibility with such symbol tables.)
E.g.,
CXP0FB = $02 ;read register
WSYNC = $02 w ;write access only
Predefined symbol tables for the C64, the PET 2001 and the Atari VCS may be loaded directly from the UI.
If you have symbol tables ready for other popular systems (BBC Micro, Apple ][, etc) and think this may be useful and should be made available, as well, let me know! (For contact details, see mass:werk <contact> .)
Finally, the disassembler now supports .DATA pseudo instruction for its symbol tables to declare an address or a range of addresses as data to be excluded from the disassembly (".BYTE" pseudo instructions will be added instead.)
E.g.,
.DATA $2040 ;exclude a single address
.DATA $2040 ... $240F ;exclude the range $2040 ... $240F (inclusive)
.DATA $2040, $240F ;as above
.DATA D1 ... D1+4 ;expressions are allowed
This should be useful for drilling down on some code: just keep adding “.DATA” statements to your symbol table as you identify a data section and keep on disassembling.
By this, I think, the disassembler is somewhat complete and final. (Well,… )
Never say never, etc, @EdS has contributed a source for a symbol table for the BCC Micro/Acorn OS ABI.
Circumstantially, the apps now support ‘&…’ hex notation to the comfort of our poor friends who were tragically mislead by the BBC (while MOS made it quite clear that is ‘$…’).
Acorn’s Atom BASIC, produced without BBC involvement, uses # and & to convert between decimal and hex. Like BBC BASIC, the Atom’s interpreter included an assembler, but was a very idiosyncratic language. The sense of the Atom’s & operator is quite the opposite from what one might expect, though, since it converted decimal to hex.
The outline specification for BBC BASIC requested requested syntax compatibility with MS BASIC 5.0. MS BASIC (at least BASIC-80 5.21) used & to denote octal constants and &h for hex. Earlier MS variants for the 6502 mostly didn’t, though AppleSoft supported $ notation.
Some of Atom BASIC’s weirdnesses made it through to BBC BASIC, but mostly compatibility prevailed.
It’s really a bit confusing – at least to me.
(I guess, Sophie is to blame – and the BBC BASIC specifications.)
Previously, I had “&…” for octal notation, because I faintly remembered this being used by some software roughly from this period. (Some BASIC? A 6502 assembler?) However, MOS assemblers used “@…” for octal and I really couldn’t find a reference for “&” used for this by any assembler for the 6502 family.
I hope, it’s unambiguous.
P.S.: As I never managed to obtain a Beeb and thus never became intimate with its ways, if there are any idiosyncrasies of note that a compatible assembler should be aware of, any advise is welcome!
So, as a first step, the assembler gained a special “BBC compatibility mode” (as well as a dedicated checkbox to activate this).
This switches some of the syntax and provide the following features:
BBC-style listing with “&…” hex notation
Generally “P%” may be used for the program counter, anywhere “*” may occur.
Symbols may (optionally) end with “%” (BASIC style).
Statement labels start with a dot.
Bare pragmas without a leading dot.
Assembly blocks using “[ … ]”.
Multiple statements per line using “:” as separator.
Comments with backslash ("\…").
Enhanced features, like grouping expressions with “[ … ]”, will be still available.
Mind that expressions are still evaluated without precedence (I actually do not know how the Beeb does it), on the other hand, you may still use grouping with “[…]” inside expressions. Generally, mixed syntax is allowed to a certain extent. (So the syntax is somewhat loose.)
E.g, this should compile as intended:
\FILL A PAGE WITH BYTES,
\PRESERVE PROGRAM
P% = &800
[
.START
LDX #OFFSET
.LOOP TXA
STA START,X
INX:BNE LOOP
BRK
TEXT 'FOO BAR
\INSERT BYTES HERE
OFFSET = P%-START
]
(On a real BBC micro, you probably have no comments outside […], but this is – mumble – a dedicated assembler.)
That’s great! I think it might be tricky in general to get very close to Acorn idiom, but it would have the great advantage that your site becomes a crib sheet for how to code assembly within BBC Basic. Here’s your example, mildly and somewhat clumsily reworked, in the Owlet IDE.
Ha, I see a loose end: reassigning symbols (apart from “P%”, this throws an error, as it is now).
This is never going to end…
Regarding “a crib sheet for how to code assembly within BBC Basic”, there’s already Owlet, as you have demonstrated quite well. So it may be more a “kludge sheet”…
Edit: Oops! Reassigning symbols will require 3 passes, while we do just 2. Hm…
{phew}! – we achieved somewhat feature complete BBC Micro support:
Most intricacies are respected and supported by the assembler
Support for REM, etc
Support for BASIC indirection operators outside assembler blocks ("?", “!”, “$” in patterns like “?<expression> = <expression>” or “<identifier>?n = <expression>”)
The disassembler generates likewise code, including BASIC statements with indirection for data insertion ("?P% = &xx" or “P%?n = &xx”).
Full round trip between apps guaranteed (i.e., the assembler accepts output generated by the disassembler and the emulator runs code generated by the assembler).
However, still case-insensitive outside strings and we do not check for conflicts with BASIC keywords.
Update:
The assembler supports “LEN($P%)”, which is, while a very special use of a BASIC function, kind of important for BBC BASIC indirections. (However, as it is currently, you can not use $P% on the right-hand side of an assignment and there are no BASIC string concatenations implemented. So no support for things like $P% + $P$+" TEST".) Solved! You may use $P% on the right-hand side, as well. Things like ‘$P%="WORLD":$P%="HELLO "+$P%:P%=P%+LEN($P%)’ should work as expected. (Including edge cases, where code has been inserted in the meantime.)
E.g.,
The rather prosaic source code “1 2 3 4 5”
produces the following verbose disassembly (at start address 0x4000):
which in turn produces “01 02 03 04 05 00” (the extra “00” is owed to the operand of the last “ORA”)
with the following listing:
pass 1
LINE LOC LABEL CARD
1 4000 P% = &4000
2 [
3 4000 .TEST ORA (&02,X)
4 ]
5 ?P% = &03
5 REM '00000011'
6 ?1 = &04
6 REM '00000100'
7 4004 P% = P%+2
8 [
9 4004 ORA &00
10 ]
symbols
TEST &4000
pass 2
LOC CODE LABEL INSTRUCTION
4000 P% = &4000
4000 01 02 .TEST ORA (&02,X)
4002 03 ?&4002 = &03
4003 04 ?&4003 = &04
4004 P% = &4004
4004 05 00 ORA &00
done (code: 4000..4005).
P.S.: The assembler now also supports niceties like a heading comment that will be used as a page header together with the “.PAGE” (BBC mode: “PAGE”) pseudo-instruction.
Some of the neater tricks with BBC BASIC embedded assembler syntax:
If we take the above source code and exchange the second indirection statement by something fancier (mind the expression for the address, this could be a simple number or a variable name, as well), we may assemble out of line:
...
?P% = &03
?(&1000*4+16) = 2+2 \<--- put 0x04 into 0x4010
P% = P%+1
...
which provides the respective assembler listing for these source lines:
Especially, if we consider that these lines are actually a break-out to BBC BASIC with all its capabilities (on the real thing, which is here not reproduced to the full extent).
(Well, I’ve learned a few things thanks to this project.)
Here’s a final question – sort of a philosophical one:
By now, we have a real Frankenstein assembler, supporting a mix of quite a variety of syntaxes. As it is now, I made BBC mode a bit stricter than it has been before and reserved most of its features to it. In actuality, there is no real need for this. Some of the syntax errors, the assembler throws, are just thrown so that it feels right.
The sole conflicting feature is statement labels starting with a dot and pragmas without a dot in BBC mode, while in standard/MOS mode, it’s the other way round. (Even this is not a given source of errors, since many assemblers use dot-labels for local context.)
So: Should we allow the entire mix, or keep the various flavors of syntax somewhat separated?
When does it start to become confusing?
(E.g., if we allow “[…]” blocks in standard mode, will a user, who doesn’t care about BBC syntax, be utterly confused, if this doesn’t throw an error? Or, what about those indirections? How could something like ‘$100 = "FOO"’ be a valid statement? On the other hand, how would BBC-mode users feel about the “*” symbol for the program counter or “$…” hex notation, esp., if we consider that “$” is an indirection operator for strings in BBC BASIC. Or what about BASIC variable assignments directly in the assembler blocks?)
Heck, we could even incorporate some more Intel-like features, like numeric notations like 1234H and 1010B (not a good idea, because of ambiguity with labels like “AAH”). – When does loose start to become ugly, in terms of syntax?