Question: What does a directory data stream in Commodore BASIC 4.0 look like?

NoLand · March 22, 2023, 11:33pm

I’m trying to implement the DIRECTORY and (CATALOG) command for my PET 2001 emulator, but I fail to find any documentation on what the data stream from the disk drive to the PET looks like (i.e., what data the PET is expecting).

What I have found out so far (based on a few experiments):

Only every second byte of the data comprising the (disk) name is displayed.
The stream is apparently zero-terminated.

E.g., if we load instead a short program instead of the directory data, which looks like:

100 REM ** WELCOME TO THE PET 2001 **
110 ...

which looks as a data stream as follows (the first two bytes are the start address 0x401):

01 40
   25 04 64 00 8F 20 2A   %.... *
2A 20 57 45 4C 43 4F 4D  * WELCOM
45 20 54 4F 20 54 48 45  E TO THE
20 50 45 54 20 32 30 30   PET 200
31 20 2A 2A 00 5B 04 65  1 **.[..
...

We get:

DIRECTORY
 8335 *WLOET H E 01*

READY.

In other words, the first 6 bytes (up to the first zero-byte) are apparently skipped, decimal 8335 is 0x208F, which is the token for REM (0x8F) and the following SPACE (0x20), the first 0x2A (*) seems to be lost, followed by every second character of the ASCII stream following the REM keyword. Moreover, with the zero-byte terminating the BASIC line, the directory interpretation routine quits, as well (compare the ‘READY’, no errors).

This unlike anything, I have seen so far from Commodore.
Any ideas? Are there any documents or references?

KymG · March 23, 2023, 10:38pm

Is it coincidence that 0x208F == 8335?

From my experience with other MS-BASIC derivatives the first word is usually length rather than memory load address, as this is usually fixed for BASIC code.

NoLand · March 24, 2023, 1:00am

That’s the Commodore PRG format: the first two bytes are always the load address (in low, high order).
(The PET actually has no default address for loading files, it’s always like LOAD "*",8,1 on the the later Commodore 8-bits. The default load address and the secondary device address for overriding this was introduced on later machines for compatibility. Also, the VIC-20’s BASIC start changed with memory expansions installed, which really made this change necessary.)

For better understanding, how directories work without BASIC 4.0: we have to first load the directory as a special file (“$”, including optional drive numbers and/or filter patterns for filenames), which returns a program generated by the drive, which we then have to LIST. BASIC 4.0 introduced the DIRECTORY command, which generates a similar view without actually loading a program (which unavoidably destroys any program currently in memory).

This is what a directory listing looks like:

pet-demo-dir

We see a first line with the drive number (0 or 1) with the disk name (fixed length, padded by blanks), followed by the disk ID (2 chars, here “RW”) and the DOS version (2 chars, here “2A”).
The individual entries/files are listed by the number of blocks (a block is 254 bytes) as the line number, the filename and the file type (optionally with an additional marker for open files, etc).
The final line provides the number of remaining free blocks as the line number.

This describes also the minimal amount of data required to generate a similar listing.
I wouldn’t be surprised if the individual file entries were just the raw directory entries from disk, since they share a similar format across the various drives and DOS versions. The header data describing the disk, its name and layout, however, varies with drives/DOS versions, and there’s plenty of surplus data not required for the listing.
The simplest way of achieving this would be loading the “$” program generated by the drive into some buffer (e.g., the BASIC input line buffer) and printing it from there on-the-fly, which would be also completely compatible. But this is apparently not how it works.

Here’s what the output of the DIRECTORY command looks like (same as above, here using the lower-case character set):

8250_demo_dir_191

Edit: I just found out that the (BASIC) lines should be padded by blanks to fille a line of 32 bytes including the end-of-line zero-byte mark. But this isn’t it either.

NoLand · March 24, 2023, 6:55pm

Just for reference, this is what the above directory listing looks like as a program.
(I cross-checked this with VICE, where the program is generated by the code in ROM of the emulated disk drive. There’s a single difference, namely, this listing reports 0 free blocks, just as the in the image above, since the disk images are read-only. Otherwise, there’s a byte-for-byte identity.)

Listing:

 0 "DEMO            " RW 2A
 5    "UNIVERSAL DOS"    PRG
 29   "ROCKWELL DEMO"    PRG
 7    "PET CLOCK"        PRG
 27   "PRINTER DEMO"     PRG
 19   "WORLD CLOCK"      PRG
 21   "PET GRAPHICS"     PRG
 16   "REFLECTIONS"      PRG
 24   "PET DEMO"         PRG
 0 BLOCKS FREE.

Hex-dump (program starts at 0x0401):

0400: .. 1F 04 00 00 12 22 44   ....."D
0408: 45 4D 4F 20 20 20 20 20  EMO     
0410: 20 20 20 20 20 20 20 22         "
0418: 20 52 57 20 32 41 00 3F   RW 2A.?
0420: 04 05 00 20 20 20 22 55  ...   "U
0428: 4E 49 56 45 52 53 41 4C  NIVERSAL
0430: 20 44 4F 53 22 20 20 20   DOS"   
0438: 20 50 52 47 20 20 00 5F   PRG  ._
0440: 04 1D 00 20 20 22 52 4F  ...  "RO
0448: 43 4B 57 45 4C 4C 20 44  CKWELL D
0450: 45 4D 4F 22 20 20 20 20  EMO"    
0458: 50 52 47 20 20 20 00 7F  PRG   ..
0460: 04 07 00 20 20 20 22 50  ...   "P
0468: 45 54 20 43 4C 4F 43 4B  ET CLOCK
0470: 22 20 20 20 20 20 20 20  "       
0478: 20 50 52 47 20 20 00 9F   PRG  ..

0480: 04 1B 00 20 20 22 50 52  ...  "PR
0488: 49 4E 54 45 52 20 44 45  INTER DE
0490: 4D 4F 22 20 20 20 20 20  MO"     
0498: 50 52 47 20 20 20 00 BF  PRG   ..
04A0: 04 13 00 20 20 22 57 4F  ...  "WO
04A8: 52 4C 44 20 43 4C 4F 43  RLD CLOC
04B0: 4B 22 20 20 20 20 20 20  K"      
04B8: 50 52 47 20 20 20 00 DF  PRG   ..
04C0: 04 15 00 20 20 22 50 45  ...  "PE
04C8: 54 20 47 52 41 50 48 49  T GRAPHI
04D0: 43 53 22 20 20 20 20 20  CS"     
04D8: 50 52 47 20 20 20 00 FF  PRG   ..
04E0: 04 10 00 20 20 22 52 45  ...  "RE
04E8: 46 4C 45 43 54 49 4F 4E  FLECTION
04F0: 53 22 20 20 20 20 20 20  S"      
04F8: 50 52 47 20 20 20 00 1F  PRG   ..

0500: 05 18 00 20 20 22 50 45  ...  "PE
0508: 54 20 44 45 4D 4F 22 20  T DEMO" 
0510: 20 20 20 20 20 20 20 20          
0518: 50 52 47 20 20 20 00 3D  PRG   .=
0520: 05 00 00 42 4C 4F 43 4B  ...BLOCK
0528: 53 20 46 52 45 45 2E 20  S FREE. 
0530: 20 20 20 20 20 20 20 20          
0538: 20 20 20 20 00 00 00         ...

A maybe interesting observation to be made concerns the very last line.
As mentioned above, each line is exactly 32 bytes long. (With the notable exception of the first line with the disk title, which ends with the DOS version string. This is owed to the reverse video output and the fact that “RVS OFF” is a shifted character, which would be interpreted as a BASIC token. Therefore, the line must end here, without any padding.). The very last line, however, is 32 bytes including the end-of-program mark. Technically, the end-of-program mark is an empty line link, which is already part of the next line. That this is here rather thought of as part of the last line provides a rare window into Commodore’s thinking about these things.

P.S.: Still not working with “DIRECTORY” in BASIC 4.0.

P.P.S.: This is the same, regardless of the computer connected. So loading a directory by LOAD "$",8,1 on a C64 will load the directory program into the video RAM. Which may cause some interesting effects.

mainframetom · March 25, 2023, 4:40pm

I would say, it’s just a bug. I read about many bugs in Commodore BASIC ROMs.
BASIC 3.5 and up were designed for later machines (C16, 116, and I would never use that BASICs on a PET).
Maybe it has something to do with the drive speed (UI-) at least the vic needs a slower speed.
And I think just load the directory with ,8 and not 8,1.
I would ask that on a commodore forum.
Does this also occur on vice on PET and other machines?

NoLand · March 25, 2023, 6:02pm

BASIC 4.0 actually came first for the PET2001N (CBM 30xx outside the US) as an upgrade ROM and is available on all later PETs. The Commodore DOS inside the drives is designed to work with the PET and later machines somewhat work around this. (The problem here is really just the start address of the directory list program generated by the DOS in the drive. On the PET, the load address is always absolute, whereas later Commodore 8-bits default to relative loading to the respective BASIC start address and switch to the PET behavior by secondary device address of “1”.)

So this shouldn’t be an issue with the PET. It’s really just a matter of how the PET communicates with the drive. I do see that a program is loaded with filename of “$” on a “DIRECTORY” command. However, while the file I generate on this in the emulator is identical to what a drive returns in VICE, this fails. So there must be more to this. (I did a binary comparison on this. The only difference being that I return a message with “0 BLOCKS FREE.”, but changing this to any other number doesn’t have any effect.)
Another symptom is that, when a first DIRECTORY command fails, any further try returns without any output. So there seems to be some “unfinished” state.
Still, there must be more to this, as I’d expect the output to work until the directory routine fails for whatever reasons. Notably, the output of the DIRECTORY command can’t be slowed down, etc, you can just start over, indicating that this is printed from a (line) buffer on-the-fly. So a failure along the line should not affect any previous output. But the output already fails on the first line!

Notably, the enhanced disk commands (DLOAD, DSAVE, DIRECTORY, etc) were not available with the “Level II” ROM-set (also known as “ROM 3” or “New ROM”), while drives were already available (and were operated just the same as they would be on later machines, like the C64.) So, I assume, the DIRECTORY/CATALOG command must have utilized what had been already implemented on the drives, as any new approach to communications would have required upgrading the ROMs in the drives, as well, since the DOS resides in the drives (running on a dedicated 6502/6504 inside the drive, for which there are two in the Commodore drives, one for operating the drive and another one for running the DOS handling communications). Notably, such an update was not required. From this follows (at least for me) that this must somehow operate on the basis of the LOAD "$",8 command.

I’m really at a loss here.

(Regarding Commodore communities, the issue is that these are mostly centered on C64 related things. There is really no information on how the BASIC 4 disk commands work. This is also illustrated by there being just a single ROM listing for BASIC 4.1 that contains the disk commands and this is poorly commented in these sections. VICE just emulates the chips inside the drives, so there is no need to really understand what’s going on, either. Opposed to this, my approach is really about abstracting the disk implementation, which should provide better interoperability in the long run. E.g., we can mount tape archives as disks.)

EdS · March 25, 2023, 6:27pm

It’s quite surprising that you’re breaking new ground… but good to get the story straight, if you can.

minus56bits · March 27, 2023, 10:37am

I have annotated (in German) ROM listings of CBM BASIC 4.0 and CBM DOS 2.5, both from SM Softwareverbund - Microcomputer GmbH, if that would help? These listings are paper only and due to our move all things retrocomputing are currently hidden away in some boxes so it would take a while to provide this information.

NoLand · March 27, 2023, 12:21pm

I think, this won’t be worth the effort. I’ve a rough idea what the ROM routine is doing and that this is somhow related to the drive status. Thank you anyway for the kind offer!

Having said that, given that there are only few listings out there, it may be interesting to eventually scan this and load it up to / send it to zimmers.net, regardless.

mainframetom · March 27, 2023, 3:19pm

I still don’t get it.
Is it just a bug on the disk file creation (by your emulator)?
Or is this with every disk file? Strange that also the remaining bytes are wrong.
Can you load a file, if you know the file name or is the file name wrong as well?
What about the drive speed? What drive is it 1541?
Have you tried it to switch to UI-?
Have you tried it on MAME and check that code?

If it’s just a display bug, it’s maybe possible to patch that, convert it or use other code.
Why would someone use the enhanced disk commands on a PET?

NoLand · March 27, 2023, 5:45pm

To provide some background: The emulator is based on original code by Tom Skibo and the sections for IO and the IEEE emulation are still pretty much his code. This really only implements what’s necessary for LOAD and SAVE on device 8, but nothing else. I’ve since tagged on additional functionality onto this, like interpretation of disk and tape images, virtual directories, file drag-and-drop, etc, but here I do hit a wall.

My first idea was that this would be much like LOAD “$”,8, because that’s what was already implemented in the drives and available out of the box. And I do see (in the current implementation) a LOAD state with filename “$”, which may be also, as we call it nowadays, a hallucination.

Symptoms:

I get a status code (system var. “ST”) of 2, indicating an input timeout (device is not responding in less than 64ms). Meaning, there’s some miscommunnication.
The first 6 bytes are “eaten away”, meaning, they are consumed by some other mechanism
The first line is misinterpreted (small wonder, given that the first bytes are missing)
The timeout happens on the first line (or immediately after this)

The bytes being consumed by something (maybe reading the device status, which isn’t really implemented.)
The timeout indicates that we’re missing a change of state, maybe a command send over the bus. This hints at this not using a LOAD command, but rather OPEN.
Meaning, something along the lines (pun intended) of this BASIC program for listing a directory without overwriting the program in memory:

10 OPEN 1,8,0,"$"
20 GET#1,X$,X$
30 GET#1,A$,B$,C$,D$
35 IF ST THEN CLOSE 1:END
40 PRINT ASC(C$+CHR$(0))+256*(ASC(D$+CHR$(0)));
50 GET#1,X$
60 IF X$="" THEN PRINT:GOTO 30
70 PRINT X$;:GOTO 50

(Line 20 consumes the load address, line 30 reads a line header: 2 bytes line-link and 2 bytes for the line number. Line 40 prints the line number [actually the number of blocks] from the binary format, line 50 gets the next byte. If this is a zero-byte [end of line], we continue for a new input line at line 30, else we print this character and continue for the next character at line 50. This loops until the status variable “ST” is anything than zero, when reading the next line, probably a code of 64, end of file.)

Without anything mounted, the emulator will emulate an empty virtual directory:

 0 "PET MOUNT POINT "███████
 0 BLOCKS FREE.

Running the above program results in:

 4608 PTMUTPIT"
 16896 OK RE

Note how we’re missing every second character, as well!
This is similar but not exactly the same as what we get with the “DIRECTORY” command:

DIRECTORY
 8722 E ON ON     >

Still, I’d attribute the missing bytes in the disk name to a similar mechanism using the protocol for OPEN rather than the one for LOAD.

Conclusions: I’ve to re-implement the entire IEEE stack, with explicit support for OPEN and SAVE, etc, and device channels (something that’s missing, as well). Fixing things up on the cheap won’t help. Since I wanted to to do this anyway, this is probably the way to go.

(I’ll post anything of note, if I’m encountering any along the way…)

[Edit]:
Why are we missing every second byte?
Probably there’s a bus command for GET#, requesting the next input (not just the IO handshake being held down.) If we have a state mismatch, the IEEE emulation would send the next byte, where it is supposed to receive one. Mind how this also matches the 6 missing bytes and the 6 reads at the start of the above BASIC routine. And there we are…

litwr · April 18, 2023, 7:15pm

Maybe base:reading_the_directory [Codebase 64 wiki] can help.

NoLand · April 18, 2023, 9:20pm

Yeah, this points to the very same thing, one byte being consumed as the device status byte (because of the lack of implementation), before the next byte is read as data.
Compare routine getbyte, which first reads the status byte, then the actual data byte (via the Kernal routine CHRIN). This is consistent with the output missing each second byte for a given directory line.

Sadly, this also means a rewrite from scratch, since there really should be proper device channels in order to support the the device status… (Device channels are also the way to discern LOAD from OPEN for read.)
There is also a chance for a really dirty hack, where we more or less assume that it’s a directory/catalog command, if the filename is “$” and we’re running on ROM 4.0, with minimal checks otherwise. (Meaning, if these conditions are met, we could inject appropriate status bytes into the directory program delivered via the bus.) But this is a route, I don’t really want to go…

But we were able to establish a few things:

The DIRECTORY/CATALOG command builds on the “$” program generated by the drive (which had been implemented in drive ROMs already, when BASIC 4 was released)
It rather uses OPEN than LOAD to read this
The read sequence reads and skips the first two bytes (program start address),
Then, for each line, two bytes are read and skipped (link address to the next BASIC line)
Then, two bytes are read for the line number (as a 16-bit integer)
Then bytes are read as character data until this results in a zero-byte (indicating the end of the line), at which point we continue for the next line
Generally, a status byte is read for every data byte (the major difference to LOAD operations!)
The end of program is probably not determined by checking the line link (for being two zero-bytes), but rather by relying on the device status indicating an end of file status. (Compare the listing linked above and a few other implementations.)

The latter also explains, why the emulator exits after the first line, as a data byte is read as the device status, indicating a non-zero value, which is interpreted as the end of file. We can actually explain, why the emulator behaves like it does.

litwr · April 19, 2023, 8:24am

I have done several tests with the VICE emulator. I was surprised much. Unlike the C64, there is no SETLFS and SETNAM in PET’s KERNAL! They just parse the LOAD command line via the LOAD call! It is also interesting that the PET uses routines in ZP.
The PET KERNAL does not use calls to LOAD, OPEN, or CHKIN when DIRECTORY is called. It seems the KERNAL uses a special routine for DIRECTORY.
IMHO only Basic 3.5 was the first Commodore Basic that fixed system drawbacks of previous Basic versions. However Basic 2.0 also was good for io operations. It is an irony that the worse Basic version has higher version number, 4.0.

NoLand · April 19, 2023, 9:02pm

Yes, it isn’t that transparent what’s going on with LOAD/DIRECTORY. Moreover, this section is either missing from ROM listings or lacks any comments.

I’m actually not sure what you mean by BASIC 3.5, as there’s some confusion regarding naming ROMs on the PET 2001.

As far as I can see (keyboard versions aside), there have been 4 ROM configurations for the PET 2001:
(Disclaimer: I don’t claim to be an expert on this.)

a) the ROM, the PET 2001 originally shipped with (asterisks in the initial prompt, no monitor, problematic disk implementation), version 1.0

b) an updated, bug-fixed version of this ROM, sometimes called 1.1, sometimes called ROM 2. It seems, most PETs were upgraded to this one and if someone speaks of “Old ROM”, this is the one. (There seems to be no distinction made between (a) and (b) in contemporary PET literature.)

c) the ROM with pound marks in the initial prompt, monitor, etc. Sometimes called ROM 3, ROM 2 by others. In Commodore terms “Level II ROM Set”. I see this referred to as BASIC 2 and as BASIC 3. Known as “New ROM” in the day.
(Personally, I call this one ROM 2.0, because of Commodore’s “Level II”. Also, the C64 BASIC ROM seems to be based on this one and is called BASIC 2.0 by Commodore.)

d) the ROM in question, with “BASIC 4.0” in the initial prompt (also, the return of the asterisks) and the extended disk commands. Shares zero-page locations with (c), but comes with an additional ROM chip. The editor supports 40 and 80 cols, but only 40 cols are used. Known as “ROM 4”. (Available on the PET 2001N or as a separate upgrade.)

Notably, these are ROM-sets and individual ROMs / code bases come with their own versioning (and ambiguities – is it about capabilities or about bug-fixes and/or specific implementations?), which doesn’t help…
So, I tend to be confused with the nomenclature…

litwr · April 20, 2023, 7:06am

Actually I was not aware of a, b, and c options. I meant the C264 machines, they use Basic 3.5 which has disk commands from Basic 4.0 (DIRECTORY, DLOAD, …).
I have partially tracked the execution of LOAD"$",8 on the PET. It seems that it uses code which is completely different from code for DIRECTORY.

NoLand · April 20, 2023, 10:18am

Ah, this one. While I haven’t any in-depth knowledge about this, it always appeared to me as one of the best versions on Commodore’s offer.