Virtual 6502 Suite Update

I’m tempted to say that Beeb mode is a world of its own, and while you will delight the Acorn people by providing what you have (which is splendid and marvellous), you might well only perplex everyone else. I say this, sitting in the Acorn camp. So, I think I’d suggest keeping the worlds apart.

My thinking is quite along the line. It’s just a bit strange to have an assembler that actually does all these features, but (deliberately) throws errors (and warnings) depending on the setting of a checkbox…
Well, best think of it in terms of spell checking.

Some of this is quite arbitrary. E.g., even in BBC mode it’s a bit unclear, if you have to be inside an assembler block. (You may just want to try a few instructions and set the program[me] counter. Are we actually starting in BASIC mode? Who says that?) So you may use actual instructions (mnemonics) outside of a block. On the other hand, setting the program counter inside a block is allowed, but throws a warning (again, to facilitate a quick session), while assigning to any other symbol throws an error (which is much the same operation, really). Also, you may use modern 0x1234 notation (why not, if it is allowed in any other mode and doesn’t conflict with the core syntax?), but not $1234, which is, again, quite arbitrary. – Mind that this is not about a strict emulation, but is just meant to allow a user to handle traditional Beeb code, along with other formats.

I am simultaneously delighted, perplexed and deeply impressed by the descriptions of your creation. It’s a pity that I don’t have enough spare time to explore it in depth, at least not right now.

1 Like

I’m also running out of steam a bit. :slight_smile:
The problem being, that the code keeps growing, while the base is some rather dated JavaScript from 2005. In case of the assembler and disassembler, these even root back to a BASIC implementation (on the C64) from the 1980s. Layers of retro computing, while edge- and side-cases keep growing. It’s a bit like juggling and I’m afraid, once I give it a pause, all the balls will fall down and kind of roll under the sofa, never to be found again…

*) The legacy of the BASIC implementation shows mostly in the assembler, where the input is first parsed into chunks of supposed symbols/tokens, before anything is actually parsed at all.
(While this approach is quite economic for a simple implementation, it comes with some limitations, like no white space inside an expression, since an expression must be contained in a single chunk. So I have to navigate around these things constantly, since there is no general parser mechanism and features like evaluating expressions are retro-fitted on top of the existing solution, which was built around mnemonics, alphabetic symbols and simple numbers only.)

So, final addition (well… :slight_smile: ):

We parse string concatenation in BASIC mode for the BBC Micro. You may even use $P% on the right-hand side of an assignment. E.g.,

$P% = "WO" + "RLD!"
$P% = "HELLO " + $P%
$P% = $P% + LEN($P%)

should work as expected. (However, $P% is the only thing accepting strings, no other string variables are implemented.)

For the benefit of the uninitiated:
An assignment to “$P%” (or any other expression representing a memory location following immediately after “$”) inserts the string on the right-hand side into memory, in the case of $P% at the location of the program counter “P%”. Now we also have to update the program counter, in order to avoid overwriting this insertion. When assigning to $P%, this may be accomplished by the idiom ‘P% = P% + LEN($P%)’. (There are BTW “nice” edge-cases, e.g., when we first insert, say, 10 characters, update the program counter just by 2 and assemble 4 bytes the usual way. Now $P% is only 4 characters long and the first 6 characters have been consumed. So, having $P% on the right-hand side is sort of tricky. It actually reads from memory in a certain range. If we overwrite some of it, the string now includes this overwritten bytes. – BBC BASIC keeps surprising me!)

I suggest that this is not materially different from -Wunused or similar “did you really mean to do this?” style of warnings in (in this case) C compilers. If the assembly is supposed to be for one assembler or another, and it uses dots or stars or $ or whatever in the wrong fashion, it probably represents an unintentional error.

You are, of course, correct.
It’s probably a matter of from what direction you’re coming to this. I started with, “well, let’s add some compatibility for the Beeb so that this may pass, as well, what would it take?”, but this then went somewhat off and it took quite a lot. In hindsight, it’s pretty clear that this is really a different mode, while it wasn’t that clear from the beginning.

2 Likes

At this point, I really need the well fenced modes. This is much owed to the architecture of the assembler, where special purpose parsers are grafted on an originally simple parsing concept.

Take for example this expression in BBC mode (I’m not sure, if the original supports the character literal expression “'A”):

A=(LEN("foo"+"bar")+4)*2+'A

The assembler output illustrates pretty well, what’s going on

pass 1

LINE  LOC          LABEL     CARD

   1                         A = [6+4]*2+'A

symbols
 A            &55

pass 2

LOC   CODE         LABEL     INSTRUCTION

                             A = &55

done.
no code generated.
  • “foo”+“bar” is solved in a special BBC BASIC string parser returning “foobar” (this also resolves things like “$P%”)
  • LEN(“foobar”) is solved in a special BBC BASIC expression parser, resolving the LEN()-expression and returning an assembler expression
  • With the preprocessing done, the assembler expression is listed in pass 1. Mind the square brackets and that “'A” is still a numeric value
  • Pass 2 finally shows the numeric result of this expression

Each of these expression parsers have failure modes and return normalized results, as well as partial results for error display.

E.g,

A=LEN("foo"+"bar"+2)+4

   1                         A = LEN("foo"+"bar"+2
**** syntax error ** type mismatch

An error is thrown by the BBC BASIC string parser. At this point, we’re still able to reconstruct the input.

A=LEN("foo"+"bar")+4A

   1                         A = 6+4A
**** parse error ** unexpected token, figure or operator expected

An error thrown by the assembler expression parser (doing the actual arithmetic solving). Mind how LEN(…) is already replaced by its numeric equivalent.

Just a quick note: in BBC Basic, the expression ASC(“A”) has the value 65, so that’s probably the replacement for 'A.

Well, there is now ASC(), as well… you never know when it may be needed … :slight_smile:

You may now do:

A=(LEN("foo"+"bar")+4)*2+ASC("A")

P.S.: However, it’s still evaluted strictly from left to right w/o precedence.
(There’s only the assembler style expression solver for math.)

Meanwhile, our Frankenstein assembler does bizarre things like this in BBC mode:
(Some BASIC constructs like line numbers and loops are recognized and ignored, and we’re doing a bit more of the string related stuff.)

100 REM THIS IS BASIC
110 P% = &4000 : REM SET PROGRAMME COUNTER
120 TARGET = &4100 : REM A VARIABLE
130 FOR C=0 TO 2 STEP 2 : REM PASS LOOP (ignored)
140 [OPT C \now in asm, option defining a reporting level (ignored)
150     LDX #&20 \copy 32 bytes
160 .LOOP  LDA SOURCE,X
170     STA TARGET,X
180     DEX
190     BNE LOOP
200 .SOURCE
210 ] NEXT : REM BACK TO BASIC
220 $P%=":-) "+STRING$(28,CHR$(ASC("*")))
230 P%=P%+LEN($P%) : REM UPDATE PC

Note:
Line 220 produces the string “:-) ” followed by 28 asterisks (for a total of 32 characters) using BASIC string functions and inserts it into memory at the current location. (STRING$() repeats the string value provided as the second argument for the number of times given by the first argument. CHR$(ASC("*")) is there only for boasting and really the same as the simple string “*”.)
Line 230 then updates the program counter accordingly.
This is much the same as the pseudo-instruction “.TEXT ":-) *********…” in standard assembly, but accomplished programmatically in BBC BASIC.

The result of this is the following object code:

4000: A2 20 BD 0B 40 9D 00 41
4008: CA D0 F7 3A 2D 29 20 2A
4010: 2A 2A 2A 2A 2A 2A 2A 2A
4018: 2A 2A 2A 2A 2A 2A 2A 2A
4020: 2A 2A 2A 2A 2A 2A 2A 2A
4028: 2A 2A 2A

Currently working on a modernized design, which should be a bit friendlier and more in line with the rest of my site.

1 Like

When you get to the point of trying to emulate the BBC Micro’s sound chip, we’ll know this has gone too far! But so far, so good. Very impressive.

Be reassurerd, this is where I stop.
That is, there could be a Z80 attached via the Tube interface, or an ARM, or even an entire PC… {rushes off} :wink:

String functions and BBC BASIC indirection operators are somewhat a pain to implement, but, without them, you can’t even write a “Hello World” program, as I understand it. (The rest is comparably cheap, like ignoring any leading line numbers.)

String support is really at the bare minimum: you really need LEN() for advancing the program counter, and ASC() and CHR$() are somewhat a minimal complement, when we’re at it. STRING$() is maybe already a bit baroque, but, at this point, we have already everything in place to implement it and it may be frequently used for building screens, etc. However, there is no LEFT$(), no RIGHT$(), nor any of the other stuff, and you can’t have any string variables but the $P% construct.

1 Like

Now, here comes the funny part:
I’ve been reading the wrong manuals and just learned about the BBC Micro constructs EQUB, EQUW, EQUS (which are equivalent to the standard directives “.BYTE”, “.WORD”, “.TEXT”) and EQUD (for a 32-bit sequence / 4 bytes), as well as ALIGN. There’s also ASC"<char>" for a character value (like'<char> in standard notation.)

In other words, there is no need to break out to BASIC at all.
It could have been so easy, both on the assembler and disassembler side of things!

(However, these directives seem to be obscure and there are probably some who are following the wrong manuals, as well.)

Well, there is probably a moral to this… :wink:

Oh dear, it does seem that you’re right, which means I’ve missed a trick. I see the original 1981 BBC Basic lacked these keywords, but the 1982 or later versions are much more commonly seen.

The original User Guide isn’t much help for the assembler, but almost everyone who programs the Beeb has a copy of the Advanced User Guide, which is.

Thanks for the link!

What’s even missing in the “Advanced User Guide”: ALIGN aligns the location counter to a multiple of 4 (matching EQUD lengths). It’s apparently meant to be used after inserting text of arbitrary length using EQUS.

E.g.,

EQUS "Hello World!"
ALIGN /now at a multiple of four, so that we may sensibly use
EQUD &11223344 / some data

Hmm, are you sure? Is this a yet-later addition? If it doesn’t run on 1982 BBC Basic, in JSBeeb or Owlet, I’m not sure if I’m on board!

Well, it may have been an Archimedes/RiscOS addition. It’s listed in the RiscOS BBC BASIC manual, which also provides the 6502 syntax (there are other directives used on ARM for this) and it doesn’t discriminate in any way between the platforms.

http://www.riscos.com/support/developers/bbcbasic/appendices/armassembler.html

The ALIGN directive ensures that the next P% (and O%) that is used lies on a word boundary. It is used after, for example, an EQUS to ensure that the next instruction is word-aligned.

(Mind that P% is specific to the 6502, where ARM uses O% – as I understand it.)
However, “word boundary” is used in terms of 4 bytes (32 bit)…

P.S.: As far as the “virtual 6502” apps are concerned, ALIGN is not used by the disassembler, but the assembler does “understand” it, which shouldn’t cause any harm or irritations.

Ah yes, as BBC Basic continued (and continues) to evolve, it’s good to use the right manual for the appropriate version - in my case, the 1982 version!

1 Like

Another “phew!”:

For those not already bored with this thread, the entire suite has been updated to a new appearance.
Also a few refinements and updated descriptions.

By this, it’s not only a Frankenstein assembler, but a Jekill and Hide assembler, as well, – a strange case, indeed – as I added light and dark modes. :wink:

1 Like