Large data sets with small address space

Paganini · May 31, 2022, 5:22pm

When I was growing up I remember our library switching over from a card catalogue system to a computer database. This was probably in the early 90s. Our library being a library in a small town, they naturally didn’t have funding for cutting edge computers. The best I can tell from looking at old pictures and comparing with my memory, they were green phosphor VT320s connected to something running some proprietary library database software.

This thread on the 6502 forums talks about data taking much more space than code; a little experience with my own little 6502 project bears that out. My firmware with LCD interface, keyboard driver, and monitor is only going to take a few hundred bytes of ROM. I can easily see fitting a full Xinu port into a 32KB ROM with room to spare.

But a single well-documented source file can easily dwarf the entire address space.

So how did historical systems address this problem?

What kind of physical mass storage would my library computer have been using for its big database?
How did software that had to run in an 8-bit address space manage large data, where a single file could be larger than the address space? (Say, if you’re Robert Jordan writing a 1,000 page novel on your 8-bit Micro running CP/M)

jecel · May 31, 2022, 7:02pm

Note that very few 8 bit computers used 8 bit address spaces (limited to 256 bytes). I can only think of the Kenbak-1 as an example. The 6502, for example, used a 16 bit address space allowing it to directly address 64KB of memory and i/o devices.

Mass storage (hard disks, tape, floppy disks, SD cards, USB drives, CD-ROMs, etc) are not directly addressed by the processor but instead appear as i/o devices to be controlled by the operating system.

The normal way these “block devices” (in Unix terms) are used is to have one or more small buffers in memory and to swap blocks from and to these buffers with the disk. This can either be raw disks or some kind of file system implemented on top of that.

You need a way to address these disks or the files on the disks and this can be in the form of several numbers (sector, head, cylinder, for example) or one big number (the offset for the seek command for a file, for example). An 8 bit processor can deal with arbitrarily large numbers (using an add-with-carry instruction, for example) so how large the files can be for a given operating system is entirely up to its designers.

Note that if you connect a disk that is larger than 4GB to a 32 bit processor you have the exact same problems you are worried about.

drogon · May 31, 2022, 10:05pm

Handling large data sets on an 8-bit micro? Easy…

Well - tricky, but not impossible - you do it on disk with a filing system capable of random access.

Way back (40 years back) I wrote some software on the Apple II (6502, 8-bit, 64KB RAM, about 30KB usable after the monitor, basic interpreter, DOS, etc.) to manage a video tape rental library (I said it was way back!)

The limitation then was the size of the database due to the limit of the floppy disk - 130KB. I don’t recall the details but it used random access files, indexed by a unique number each tape had. There was a customer data base a tape library data base and a data base of what tapes were out on loan. (I suspect these days we’d make it tables inside a single database).

The record sizes were fixed, so to access, say, customer number 24, you’d seek (position) to record number 24, having given the DOS a record length when opening the file. Quite simple as long as you didn’t write data bigger than the record length…

So if a customer record was considered to be up to 150 characters long, then you can store about 880 customer records on a single floppy. Most Business Apple II’s had 2 drives, so customer names on one floppy, video tapes on another and off you go…

More than enough for a little corner shop from what I recall…

That principle then is no different to today - the random access file is now called a database and would be hidden inside some sort of SQL based access system.

-Gordon

EdS · June 1, 2022, 2:34pm

About writing a book on an 8-bit CP/M machine, the advice I gave my father was to use a new file for each chapter. That worked pretty well.

But in general, as noted upthread, the answer is not to have the data in memory and therefore be using memory addresses, but to have the data on storage and use some kind of extended address (hierarchical or flat) to identify each piece.

In the land of Acorn, there’s BAS128, a Basic interpreter which uses 16 bit addresses to point to program text and data in four 16k banks of RAM. Admittedly it’s still only 16 bit addressing, but it keeps the Basic memory arena separate from the machine addresses which include the interpreter, operating system, and screen memory.

As many late-model Z80 machines had much more than 64k of RAM, they must also have had schemes to address their larger memories, but I don’t know what they were.

Acorn also had a 256k machine (only used in-house and never sold) which extended a couple of 6502 opcodes to allow 24 bit indirect addressing.

EtchedPixels · June 8, 2022, 1:06am

In the CP/M world the scheme for more than 64K of RAM was called “temporary files on ram disc”. There was no portable API. Most of the home micros didn’t have any interface for it either, you just grabbed it by the bare metal and did what you needed.

A few tried - the BBC actually had something vaguely like an API for the sideways memory and the disk interface side of the ZX +3 etc sort of had some interface bits.

A lot of 8bit software simply managed the working set in software. Whether that was records, slices of game data (eg Infocom games) or whatever. Providing your working set fits and your algorithms are not dumb it generally worked ok. Some of it is just about algorithm choices - merge sort and so on, other bits are about good design (my new C compiler can compile Fuzix in 48K of RAM plus disk space because it borrows every trick from other compilers of the era to save memory)