Thoughts on Apple's IIGS

However – by this inching unobtrusively towards the original topic of the IIGS – Apple could be well caught with two platforms with widely varying goals for coding, optimization and deployment, unable to merge them and eventually having to drop one. (Mind that the specs are probably more out of sync than they had been with any previous processor transition. Apart, maybe, from the 68K / 65816 dichotomy – and we know how this one worked out. I still think, it would be a risky business, which requires careful deliberation.)

1 Like

I recall my first reaction when I heard of the '816 was that this was a splendid thing, a 16 bit extension to the beloved 6502. However, I’m less enamoured now: I found it a bit tricky to program, and while the '02 mode is of course a very faithful '02, the full-on '816 mode isn’t nearly as clean as I’d like. That said, evidently Apple made a go of it. (And Acorn also had a go, with their Communicator product. In that case, so far as I can make out, each task works within a 64k allocation: not by means of hardware mapping, or by using the bank boundaries, but presumably by software convention, using base+index addressing.)

1 Like

I’ve to confess, I’ve no first hand experience with the '816, but some with the 6502. The latter one was quite a peculiar breed: just three registers, but some orthogonal behavior with the zero page as a bank of extension registers. I think it’s very hard to venture from there to a sensible extended design with a clean instruction set. The “hybridity” of the 6502 just works out nicely at this very stage and there is no obvious way to go from there towards an extended 16-bit design. (At least none, of which I could think of at the moment.)

One thing which struck me about the '816 is that it’s a fairly strongly banked memory space - there are some addressing modes which cross banks, but it feels like the bank boundaries are very much there. And yet, it’s not possible to set up each bank like a virtual 6502 with its own zero page and stack: those two pages can be relocated but only within bank 0. So bank 0 is rather special. That feels a bit awkward, although I suppose there might be some argument that it’s an excellent decision.

Sounds much like a design for multiuser/multitasking (compare ARM). However, these seems like an optimization quite far from the ways of thinking of the average 6502 programmer. (Hashtag “Incompatible goals of industry deployment and coding traditions”.)

Edit: in hindsight this seems much like programmers were thinking along the lines of code segmentation and relocatable code and WD about multiuser systems.

I disagree for a simple reason. The “only” difference between the two is the actual CPU at the core of the machine, the system APIs are “identical” (no doubt there will be some minor distinctions).

Consider, while they did not do this, Apple could have made the GS and the Mac source code compatible. At the GS desktop level (vs the Apple ][ compatibility level), they’re very close. The GS and Mac toolbox routines (which is the primary OS interface, and, notably, drives the GUI) are, at a glance, identical. Both use a foundation of Quickdraw (the underlying graphics package). All of the major high level constructs are very similar.

Of course, they didn’t do that, so there is quite some difference. But, they could have done it. It would also have been imperfect for other reasons (a 2+ MHz '816 vs a 8MHz 68K being one definite issue).

macOS is rooted in NeXTSTEP. NeXTSTEP became OpenSTEP. OpenSTEP ran on 68K, x86, PA-RISC, and SPARC. Arguably, it also ran on PPC when Apple ported it to the then current PPC architecture. A lot of NeXTSTEP code was very portable.

Many legacy NeXTSTEP programs ported readily to Mac OS X.

So, supporting a new CPU I don’t think is particularly risky for Apple. It’s mostly a matter of making a compelling enough design to attract consumers and thus incentivize vendors to build and test their code on the new platform.

The IIGS was kind of a franken machine caught between generations of ideology, market, technology, and design.

As for the 65816, it’s a really cool chip. It’s notable today in being one that had it’s feet in both eras, and the compromises involved in that. The 65816 could only have been created when it was created. No one would make one today like that, and no one would have done one like it when it came out if there was no requirement for 6502 compatibility. It’s not a Z80 to the 8080. It’s a truly market driven piece of hardware trying to fill a gap.

Intel owes a lot to Zilog for filling the gap between the 8 bit world and 16 bit world. The Z80 was enough better than the 8080 to not require some “new thing”, and it let Intel push straight ahead to the x86 family with little regard to looking back. If Zilog didn’t invent the Z80, odds are Intel would have had to.

1 Like

One thing that doesn’t make sense to me is that the Apple IIgs was cancelled eleven months before the Apple //e. I wonder if something like the IIc Plus but with IIgs components would have had a market?

I’m finding exactly this now in my Ruby '816 project. Almost to the point of wonder if it’s really worth it, however the alternative? I don’t know - because I want a retro style system that’s usable and can support it’s own software development. I’m almost there, so should just push on, but the number of times I’m thinking “if only it did it this way, or that, or …”

-Gordon

1 Like

Do you think you found it tricky to program because you considered it still a 6502 with big registers, vs treating it as a 65816? I don’t know, I haven’t done any real programming with it. But it certainly a different animal than the 6502.

What’s important to appreciate about the GS, is the contrast to the Macintosh. The Macintosh had the ever popular 68000 at its core, but the OS design imposed a lot of limitations on the design of software. For example, while the 68000 had a nice flat memory model, code segments loaded by the OS were limited to 32K in length. If your program was bigger than that, you had to break it up.

I don’t recall if the GS has the 32K barrier or not. But code is organized very similarly. The GS uses a relocating loader to load you code in to memory. You don’t really care what page you’re loaded in to. It’s the loaders and memory managers job to find space big enough for your code, and then load it to work in that space. You don’t need to write relocatable code, since the loader does that for you (your code is not just a raw memory dump).

The memory manager does give you the flexibility to use locked down code at specific addresses and pages, and such. You just shouldn’t. Your code should be cognizant of higher order processing involved and let the OS do its job.

I don’t recall how the stack is managed. And most of your accesses will be through the “long” pointers, vs the local ones. Stack addressing is used a lot for the stack frames of the languages.

If you treat the '816 as an 8 bit process with more memory, then I think, yea, it can be a bit awkward to use. If you treat it as a 16b processor with 24b address bus, this it flattens out. Of course, there’s issues with page crossings and such. Again, that’s part of working well with the memory manager. It’s just is to not let you allocate boundary crossing pages. It’s also a higher level memory manager.

In contrast to a generic C runtime routine like malloc, where you you’d readily allocate a bunch of small pieces of memory (say for a dynamic list), and get pointers to that memory, the IIGS (and Mac for that matter) memory manager WILL do that, but you “shouldn’t” do that. There’s quite a bit of overhead with memory management. Its better to allocate larger blocks that are dereferenced through “Handles” (which are pointers to pointers). This allows the memory manager to move blocks around, compact memory, etc. Your code works, ideally, with larger blocks of memory and offsets rather than a bunch of direct addresses.

The memory manager is no panacea, you certainly have to jump through hoops to work with it. But it’s an artifact of the nature of the OS, and working with a machine that does not have an MMU.

And the key point, I think, is that the interface is pretty much identical to what the Macintosh with it’s flat memory model had. I honestly don’t remember if the Mac had a relocating loader, or if it simply required the code to be developed (compiled) as Position Independent Code or if it had a relocating loader. The '816 works a lot better with relocating loader, I think. You CAN do PIC in '816, but it’s more of a burden if you don’t have to.

I’m sure the design of the OS and supporting libraries could make a huge difference to how sensible and comfortable the '816 feels to program. And the introduction of stack-relative addressing surely helps with most compiled languages.

But I was working without an OS. The modal nature of the '816 was a frustration. There are five modes, one being the backward compatible '02 mode and the others allowing variation of memory and index operations between 8 and 16 bits. But I found it uncomfortable to switch modes - across a call boundary you pretty much need to be in an agreed mode - and therefore there’s not a natural handling of either 8 or 16 bit values. It’s more clunky than that.

But that makes sense with the model Apple was using at the time, your program got some code and data space. Any thing fancy you called ‘ROM’ to do the work.

Yea, I can see that for sure. There’s a lot of cognitive load going on as well keeping it all in your head what modes you’re in, much less managing any switching back and forth, saving and restoring modes on routine calls (it’s a stack push, but still, fun to debug if you don’t).

What I’ve seen of what few larger 816 programs I’ve seen, they settle on a mode (for example, 8 bit index and 16 bit accumulator) and work around that. Less efficient to be sure, but, easier to code. I think its more difficult for long time 6502 low level cycle counting folks to juggle the 816, vs just accepting the inefficiencies for at runtime for efficiency at development.

It certainly did go against the grain to switch modes, do something, and switch back. It would be worth it for something expensive like a multiply or a string search.

If the architecture had used prefix bytes instead of modes, or as well as, that might have been interesting. Although they had already filled the opcode table.

On the other (implicit) part of the question, Thoughts on Apple’s III, I see there’s a comment over on 6502.org about that. (I see also that @drogon said above “the Apple /// was designed by marketing”)

For those interested there is an Apple IIgs emulator available for most platforms (Mac, Windows and Linux). I have one running at 25 Mhz (unlimited) on a Raspberry Pi 4 with 14 Megabytes of memory which I think is pretty awesome.

Cheers,
Andy

4 Likes

Here’s a little bit of history, where Blake gets an answer to a 30 year old question:

1 Like

Disclaimer: I’m half-way decent at 6502 programming, but pretty much all I know about the 65816 on the software level I just learned by reading Brett Tabke’s A 6502 Programmer’s Introduction to the 65816 and some brief pokes into Eyes and Lichty’s Programming the 65816.

It appears that it does, more or less, have a “64K barrier.” The address space is divided into 64K “banks” addressed by the top byte of a full 24-bit address.

  • The PC is still 16-bit; all instruction and operand fetch addresses are generated by prepending the 8-bit Program Bank Register (PBR) to it. This is not part of the program counter: if you execute, jump or branch forward past $FFFF using a 16-bit address or 8-bit offset you will end up reading from the low addresses of the same bank. Interrupt vectors and their code must also be in bank 0. You can, however, use absolute 24-bit addresses for JMP and JSR (but not conditional branches), which will update the PBR with the new bank number.
  • Data access using 16-bit addresses are similarly prefixed by the Data Bank Register (DBR), though here when you index past the end of a bank it actually will generate an address in the following bank. Still, your index registers (X and Y) are just 16 bits, so you have to deal “manually” with data structures larger than 64K.
  • For reasons entirely mysterious to me, the stack and direct page registers are just 16 bits, so the locations of both are restricted to the lowest 64K of address space. (This causes particularly interesting problems when considering the use of an MMU for memory protection.)

Compared with using, say, a paged MMU that allows you to arbitrarily assign any 4K page in a 24-bit address space to any 4K page in a 16-bit address space, the 65816 offers more convenience when you have program code or total data approaching or larger than 64K, and a little more convenience when you’re really pushing the size of individual data structures towards (but not past) 64K.

But on the other hand, if you went with a 6809 and an MMU you gain a second stack pointer (also usable as an index register), stack and DP anywhere in memory and more addressing modes (constant offset from index register and postincrement/predecrement being the most important of these). That said, I’d say that the 65816 probably wins here for larger heap-oriented programs that don’t use much stack. I’m not so sure about programs that make extensive use of the stack, and likely not for programs (such as many interpreters) that benefit from having a second stack for data.

Ya, this is where 6809 shines; you can use the A and B accumulators as two separate 8-bit or a single 16-bit (“D”) register, intermixed at will.

So you basic.ally took the gs as far as it it couldgo huh?

I am kinda bummed there were no graphics cards. I recall hearing about a card that let it plug it to vga monitors but i want to say it wasnt on market long due to how apple effectively swept the ga under the rug.

Aa for a iic formfactor? Yes please.

Oh yea, the '816 absolutely has a “64K” barrier, for all the reasons you mentioned. What isn’t clear to me is whether the GS OS (which is essentially a copy of the Macintosh OS at the time) imposes the same 32K barrier that the Mac has. It very well may, simply because it’s easier to port it than change it, and 32K of code is “a lot”. Even modern Java limits individual methods to 64K total size. I’ve only hit that limit once with machine generated code, never organically by hand.

Just wanted to say, this turned out to be great, informative thread. I learned much about those processors!

When I ported QNX to my 68000 based Merlin II computer (in 1987) I was stuck with 32KB segments due to the signed 16 bit offsets. That made it more limited than the original PC version (though faster), so I changed it to target the unfinished 68020 based Merlin III instead. That had 32 bit offsets, so segments could be as large as RAM would allow.

The original Mac OS was not “32 bit clean”. Since only the bottom 24 bits of the address go to the chip’s pins the system stored extra information in the top 8 bits (something that has just been reintroduced in ARM with the TBI, top byte ignore, option). When the Mac II came out this became a problem and the OS had to be rewritten. I don’t know if they took the opportunity to also allow larger segments, but it would have been a good idea.

1 Like