Versions of BASIC that allowed hex/oct/binary to be typed in as constants?

EdS · November 18, 2023, 9:29am

There’s no such thing as a true pronunciation! I think Usenet group names are one example where British and American Englishes take different routes. So it is with char, I think.

For me, A$ is pronounced (or subvocalised) as “A dollar.” It’s one little difficulty I have with APL, I need to subvocalise, and I don’t know all the names. And one or two I might have seen and rejected!

NoLand · November 18, 2023, 10:14am

That’s a major problem for me, too, and, while I’m quite fascinated by the concept, never really tried APL. There’s an argument for semi-verbose languages. (This may have contributed to the popularity of BASIC, C, Perl, and some others, which may have hit a sweet spot in this regard.)

EnthusiastGuy · November 18, 2023, 2:12pm

I went over this topic a few months ago when I also decided to implement hex/oct and binary bases for my Continuum 93 retro computer. I implemented that for the assembly language interpretor, so it’s not BASIC, but maybe it helps. After a bit of research I decided to add these following notations:

0xFF		; hexadecimal
0b11111111	; binary
0o377		; octal

Each of these has a specific purpose. Hexadecimal is broadly used so I won’t go into that. Binary I decided to implement since I do have interrupts that take bit flags (such as which video page to force a draw on) and it’s convenient to represent a number by bits for more readability. Octal I decided to use since my video architecture uses 256 color palettes (8 bit) where RGB values are again conveniently represented by the octal base.

The notation I chose is somehow inspired by higher programming languages and is consistent if you’d like to introduce other bases.

One thing I recommend, if it makes sense for your project, is to introduce a visual separator, such as _ that you’d simply discard when compiling or interpreting. This way, certain numbers can be made more clear to follow:

LD CDEF, 1_000_000_000
LD CDEF, 0x0A_0B_0C_0D
LD CDEF, 0b11110010_00110111_00000001_01100111

Hope this helps, good luck!

NoLand · November 18, 2023, 5:42pm

I think, this is a good recommendation.
At least some versions of BASIC already ignore white space in numbers. (I always thought that this was influenced by FORTRAN, where all white space is ignored.)
E.g., Commodore BASIC:

PRINT 1 000 +  1 4
 1014

READY.

Try it here. (PET)

EnthusiastGuy · November 18, 2023, 5:55pm

Ha. I actually thought of suggesting space but I thought it might mess with tokenization, especially in BASIC. Nice to see it actually works in some flavor of it.
I implemented underscore since in C# is the same.

NoLand · November 18, 2023, 6:02pm

Previously, we had a conversation about white space in variable names – and, in this context, I think, this idea isn’t totally against the principles, as we find this already implemented for numbers.

(I think, the problem here isn’t really with parsing and/or tokenization, but with potentially weird and unintuitive runtime errors when typos in keywords lead to complex compound identifiers rather than throwing an error. The same is potentially true for figures and white space, but numbers tend to occur in quite limited context.)

Edit:
There is actually a potential problem with 0x…, 0b…, etc., notation: if we allow for white space, this should also work:

0 d 1 000 → 1000

But:
0 … number
d … identifier
1 000 … number

→ error.
Meaning, this requires a look-ahead to work.
Alternatively, “0x”, “0d”, “0o”, “0b” has to be parsed as an atom (much like “<=” for LTE).
Which in turn suggests a potential argument for using an atom like “&” or “@” in the notation, as well.

EnthusiastGuy · November 18, 2023, 9:42pm

I think it’s doable with little effort though. For instance, let’s take CIRCLE x, y, r

As soon as you recognize the “CIRCLE” you know that it has 3 comma separated numbers coming up. You just split whatever comes after CIRCLE by comma and interpret with something like this for each of them:

function GetNumber(string argument) {
  short result;
  argument = argument.trim() // remove leading and trailing spaces
  if isNumeric(argument) {  // if this fails, we know we have some alphanumeric character in that messes things up and we continue trying to interpret the base
    return convertStringToShort(argument);
  } else {
    string prefix = argument.replace(' ', '')[..2];  // get first 2 characters of the string argument after cleaning up any spaces
    string unknownBaseNumber = argument[3..].replace(' ', '').replace('_',''); // get the rest into this string removing any spaces or '_' characters
    if (prefix.toLowerCase() == "0x") {
      result = convertStringFromHexToShort(unknownBaseNumber);
    } else if (prefix.toLowerCase() == "0b") {
      result = convertStringFromBinaryToShort(unknownBaseNumber);
    } else if (prefix.toLowerCase() == "0o") {
      result = convertStringFromOctToShort(unknownBaseNumber);
    } else {
      // handle "bad number argument" here
    }

    return result;
  }
}

This is a drafty but I believe a functional method of getting the correct number. What’s left to do is the actual implementation of convertStringFromBaseToShort(string) that would also take care of errors such as for situations like:

0xAF0T1

NoLand · November 18, 2023, 11:19pm

I was thinking more about 8-bit processors. It can surely be done, but it complicates things. Notably, we get a state like isNumeric only after we have parsed the chunk in question. And the complication is really about what this chunk really is. E.g., we encounter “0”, then “B”, but is this “B” part of the prefix or the beginning of another chunk, or is it standing on it’s own (“B per se”)? This requires some context – and this context has to be collected. Notably, in typical interpretation schemes, we’ll have to run this each time, we encounter a numeric context, and running at typical 8-bit speeds, these seems unnecessarrily expensive.

Edit: On the other hand, we tend to overthink runtime speed concerns, because we generally use machine language on these systems mostly in time-critical context, where we want to “squeeze” the most out of the hardware, probably beyond what could be plausibly done. Back in the day, the attitude seems to have been more relaxed. E.g., the keyboard scan routine on the PET is checking each of the possible 80 intersections of the keyboard matrix against a table and even evaluates a side condition for each of them, and this 60 times a second (and it’s still much the same on the C64). We’d be probably horrified by the mere thought of doing this and may look for an optimized solution.

EnthusiastGuy · November 18, 2023, 11:52pm

Well, agreed. That’s why the example I presented shows some context, basically: we expect 3 arguments separated by comma which should translate to numbers (eventually). So, the moment we met the “CIRCLE” instructions, what follows is no longer a mistery. We know we ‘should’ have 3 numbers separated by comma. Offcourse, the user is free to write anything, but that’s why we validate each step of the way.
And this should be valid for any kind of instruction, that is to be predictable fully.

Yes, ideally the look-ahead should be next to zero ‘if’ you are writing a realtime interpretor. You want it as fast as possible in realtime, especially if you are running on a slow CPU. But then, maybe the whole architecture of what you’d use as BASIC there would need some reconsideration, or maybe a step back to assembly is needed. Or a mix of those.

Can’t say much about the keyboard interpreter example. Sometimes though I feel you can’t avoid redundancy, like in your example. Unless maybe you reconsider the hardware to deffer that work to a secondary (cheap) processing unit that would instead produce an interrupt whenever some key sequence is pressed.

oldben · November 18, 2023, 11:55pm

Microsoft basic is what most people think of basic.
The original and many other basic versions are compile and go.

MauryMarkowitz · November 19, 2023, 2:13pm

Binary is also very useful when defining character shapes and sprites and similar tasks.

jhi · November 21, 2023, 6:33am

Sadly, it seems that most people think of Visual Basic.

(The same Windowsification has happened with the still living Pascal descendants.)

drogon · November 21, 2023, 8:15am

Recently I ported my TinyBasic to another 6502 platform that had some on-board “GPIO” and I mused about adding binary to the interpreter, but felt that at it was pushing 4KB of binary it was already a somewhat “fat” TinyBasic… I was also looking at the BASIC DATA/READ statements, but again, Tiny… So in the spirit of letting the computer do the work I came up with:

  120 $D = "........" : GOSUB 900
  130 $D = "1......." : GOSUB 900 
  145 $D = "111....." : GOSUB 900
---
  900 Y = 0
  905 FOR I = 0 TO 7
  910   X = ?(D+I)
  915   Y = Y / 2
  920   IF X = 49 Y = Y + 128
  940 NEXT I
  950 REM PR Y
  960 ?(L+Q) = Y
  970 Q = Q + 1
  980 RETURN

Just as part of an experiment to take binary data and stuff it into memory…

(Yes, it’s part of a “Larson scanner”)

-Gordon

MauryMarkowitz · December 10, 2023, 2:47pm

I thought I had already posted this, but:

I have added hex, oct and bin constants to RetroBASIC 1.8.6:

These are indicated by prefixes on the string, &, 0x or 0h for hex, 0o for octal, and 0b for binary.

jhi · December 10, 2023, 4:16pm

Have you considered the Dartmouth MAT syntax?

MauryMarkowitz · December 11, 2023, 3:15pm

I have, that will be part of the 2.x series. I want to get all the simple stuff done first in the 1.x series. The MAT stuff is going to need some refactoring of the way I pass around values, but this is likely overdue anyway.

Kirtai · December 17, 2023, 3:58pm

Honestly, I prefer the way Smalltalk does different bases.
(base)r(number)

e.g.
8r777 is 777 in octal (511 dec)
16r777 is 777 in hexadecimal (1911 dec)
and so on.

jecel · January 4, 2024, 1:14pm

For my own assemblers I have been using a variation of the Smalltalk idea. Instead of having the base in decimal I use the largest valid digit in the base followed by the $ character followed by the actual number:

binary: 1$11001010
octal: 7$1760
decimal: 9$151
hex: F$5C9E
base 20: J$7H1C0

I am still not used to this, however. It seems a bit ugly to me even if it is practical. So I am not sure I will keep using it.

NoLand · January 4, 2024, 5:50pm

I can see this. I guess, the highest number and radix designator prefix isn’t too friendly for scanning the program text. In my opinion, any non-numeric designator should go in front, or there should be a common character in front of any number, whatever the radix. (The latter being something that the 0x… notation got right.)

ceptimus · January 4, 2024, 6:45pm

In modern FORTH, when you’re currently working in a different or unknown BASE, you can use $ for hex constants, # or & for decimal constants and % for binary constants. It also allows 0x to be used for hex, providing the current BASE is 33 or less. For any other BASE, you just alter BASE (and you can temporarily save and restore the old BASE to/from the stack).

…so to temporarily swap to BASE 17, enter a number (say BAG2), print it out in BASE 3 (treating it as a positive integer), and then restore the original BASE, you’d do (assuming your FORTH allows lower case):

base @ #17 base ! bag2 3 base ! u. base !

(result printed is 2220110210 and the stack is unaltered)

There are also the convenience words HEX and DECIMAL to quickly switch to base 16 or base 10. Again, if you want to restore the original base afterwards, you’d begin with base @ (or similar) and do base ! (or similar) afterwards.