LIFE - 4 Gosper Glider Guns on PDP-7 Type 340 display

I love Life! In this case perhaps best enjoyed at double speed…

From the video description:

The audio you hear is generated by the Type 347 display controller.

DEC PDP-7 serial number 129 with a Type 340 graphics display running Conway’s Game of LIFE

Gliders collide in the center producing and destroying several interesting structures. The pattern is cyclical and repeats at generation 1170 after a flourish of a finale.

via @cs on mastodon

3 Likes

I assume the fading of the pixels is a property of the display (vs today, where it’s an aspect of the Life implementation).

1 Like

I would have assumed the same, but I found a programming manual for it and it continuously executes a display list (held in the parent computer’s memory) and supports 8 intensity levels, so now I’m thinking the fading is programmatic (and quite handy for visualising Life.) (Edit: although I see the phosphor is described as P7 persistence for use in air traffic control, so maybe it is the phosphor fading…)

It really is quite the visual display device:
image

2 Likes

Life was quite a thing for me and my brother when a reasonably fast, full screen version was published for the TRS-80 Model I and III in 80 Microcomputing magazine in 1980. Dovetailed quite nicely with what we knew from my older brother’s Scientific American magazines from the early 70s. Considerable hours were spent filling the screen with random pixels and watching chaos resolve into order.

Unsurprisingly, work on Life goes on. Computing power has given rise to some massive constructions including an actual self-replicating pattern - the Gemini ship. Early on Conway proved that such patterns existed but though a mathematical existence proof outline.

That same computational oomph enables searches which have found some remarkably simple patterns that were surprisingly undiscovered. In 2010 there was the Loafer - a c/7 spaceship. In 2016 the c/10 spaceship copperhead. And though not so simple the first small knight-ship dubbed Sir Robin in 2018. That’s a ship that moves in a direction other than a multiple of 45 degrees.

If you like low-volume news, keep an eye on https://www.conwaylife.com/ which will have an update once a year or so. I also like to drop into the patterns forum to see what’s up.

2 Likes

I think it’s phosphor fade, because you see the same level of persistence in the tens digit on the generation count. I can’t think of any reason why they would have gone through the bother of programmatically implementing pseudo-fade on the generation count.

1 Like

How fast was the PDP-7 compared to a 1MHz 6502? This implementation looks a bit slower than my VIC20 version, but I did a lot of weird stuff to get it to go unreasonably fast on the VIC20.

Like, doing certain lookups and calculations in parallel, which is a weird challenge with only 8 bit wide arithmetic to work with, and with extremely limited RAM available for table lookups (despite the relatively luxurious 5K of RAM, almost 1K was consumed by the program and I wanted to devote most of the rest to dynamically allocatable graphics bitmap data).

1 Like

I found a figure: “The memory has a cycIe time of 1.75 μsec and provides a computation rate of 285,000 additions per sec.” That’s for 18 bit words, so we’d need a 6502 with a byte-wide memory somewhat faster than 1MHz. There’s another aspect of this setup: the 340 display is doing some intelligent work to update the display, which on an 8-bit micro would normally be done by the same micro that’s computing the Life algorithm. Of course the PDP-7 implementation might be relatively straightforward, not performance-optimised. This particular pattern is just 72x72 so doesn’t need much space. It’s managing 100 generations in 47 seconds, so about 2 generations per second.

1 Like

EDIT: I was thinking PDP-8 when I wrote the below, but using PDP-7 memory width; it’s basically a hot mess. I’ll leave it because I think the conclusion remains relevant, but the math is all wrong.

A memory-to-accumulator add (which, I believe, is the only add supported by the PDP-8) is at least 3 cycles on the 6502, so to perform a 16-bit add (let’s forgive the extra two bits for now) wouldn’t you need at least a 2 MHz clock, in addition to the memory bandwidth to support that clock?

Both the PDP-8 and the 6502 have multi-cycle operations, although I think all of the basic PDP-8 instructions (that is, the seven instructions other than the microcoded OPR instruction) are effectively one memory cycle for the instruction (during which the next instruction is loaded) plus whatever memory access the instruction may perform. As in many such exercises, creating an apples-to-apples comparison is probably rather difficult.

1 Like

It’s certainly difficult to compare machines! I only bothered to consider matching the memory bandwidth.

If we suppose that 285k additions per second is accumulating from an array in memory… and if we allow the 6502 to indulge only in 16-bit accumulation, it would take a program a bit like this:

LDX #0
loop:
CLC
LDA $F0
ADC $4000,X
STA $F0
LDA $F1
ADC $4001,X
STA $F1
INX
INX
CPX #200
BNE loop

which is about 30 clock cycles per 16 bit addition. So a 1MHz 6502 would manage 33k additions per second. We’re going to need a faster clock!

1 Like

Yes, this is the same tube technology used for the Type 30 Precision CRT commonly used with the PDP-1, which is responsible for the iconic trails in Spacewar. The P-7 phosphor features a dual phosphor coating, one kind of phosphor for a quick response and short decay (which would also trigger a light pen) and one for a long sustain/persistence. The quick response one is bright blue, while the long sustain phosphor is yellowish green, both fading into a greenish color in real-life (while frame based recording media would usually exhibit a noticeable color separation, not seen in this recording).
Fun fact: DEC offered a set of color filters to separate current state (the blue quick response) and timeseries (the long sustain) in screen photography.
(Thus, it may be said, it was a 4-dimensional display: x, y, z/intensity, and timeseries. Intensities, however, didn’t scale too well on this kind of displays, since the long sustain phosphor would have a normalizing effect. Which is, why Spacewar also used update frequencies to achieve varying screen intensities.)

2 Likes

BTW, there’s ongoing development on support of the Type 340 display for simh, see this discussion on GitHub (which also links various documents): https://github.com/simh/simh/pull/752

On the phosphor: The tube is actually specified as the same for the Type 30 CRT and the Type 340, a 16ADP7A at a 40 inch radius of curvature. (Phosphor colors are at 445nm and 555nm.)

1 Like

And here I have gone wrong, I’m sure. The 285k per second additions, and the 1.75 μsec memory cycle time, mean that we’re talking exactly two memory accesses for one addition. Which would have to mean that one is the instruction, with the address, and the result must be in the accumulator.

The difficulty with the 6502 is that accumulating into a 16 bit value pretty much means using two bytes preferably in zero page. Unless we invent a new and terrible style which accumulates into YX:

TXA
CLC
ADC $F0
TAX
TYA
ADC $F1
TAY

which is only 15 cycles, if the desired operand is in zero page, 17 otherwise. I think we’ll need at least a 4MHz 6502 to keep up. Or we can suppose that the additions we need are of very small numbers, and we can just use the 6502’s byte-sized operations:

CLC
ADC $F0

and that’s just 4 cycles for an addition, if the operand is in zero page, or 5 if it needs a full address. Our 1MHz 6502 can do 200k additions per second, by this reading.

1 Like

In this particular case, there’s never a need for 16 bit math. The simplest algorithms for Conway’s Game of Life only compute one cell at a time, where you need to add neighboring cells - a value from 0 to 9 (if we include the central cell). As such, 8-bit math is perfect adequate.

You can improve performance dramatically by calculating things for multiple cells at a time. For example, with 32 bit math and lots of memory, you can use 4 bits per cell, storing 8 cells in each long. This lets you essentially do math on 8 cells at a time. This was useful for my fast Javascript implementation here: http://isaackuo.cloudapp.net/ijk/apps/lifecommand/SpeedAcorn.html (note that performance is limited by Javascript’s extremely poor rendering speed with squares bigger than 3x3 pixels, so reduce the window size to see this fly)

On the VIC-20, I didn’t have that kind of memory, nor register width. So I had to do a lot of … weird stuff.

The bottom line, though, is that fast GoL algorithms on older computers is tricky to devise and hairy to debug. You can’t just look up fast GoL algorithms on the Internet, because they are designed for an extremely different regime. We’re talking massive tables and lots of frame skipping, because the goal is to search for various interesting things rather than to produce an interesting visual display. For a visual display, today’s computers are so fast that a dumb algorithm is adequate.

Javascript is an exception since it’s so stupendously inefficient and I was developing for a slow target device with a REALLY inefficient Javascipt implementation (iPod Touch 4th gen).

So, unless you really have to do something clever and time consuming to develop, just … don’t.

Anyway, 285000 additions per second is pretty impressive. The naive algorithm would dumbly do 16 adds per cell (8 adds to the array lookup; 8 adds of the values). Well, 285000/(72722) = 27 adds per cell. So, probably the simplest algorithm is being used. Much easier to debug.

Oh, one other thing:

“There’s another aspect of this setup: the 340 display is doing some intelligent work to update the display, which on an 8-bit micro would normally be done by the same micro that’s computing the Life algorithm.”

In the case of the VIC-20, the CPU runs at full speed. There’s a video chip that handles the display without assistance from the CPU, nor does it interfere with the CPU. (In contrast, the C64 would steal cycles from the CPU on bad-lines and when displaying sprites.)

However, the actual routines for looking up graphics data and drawing/erasing pixels in my VIC-20 implementation were significantly non-trivial, and that did put some drag on performance. I don’t have an expansion RAM cart, so I targeted the unexpanded VIC-20. Thus, I didn’t have enough RAM for a full bitmap, much less RAM for an optimized cell map. Since I wanted full screen functionality, I implemented a dynamic allocation system which would dynamically assign new characters as well as dynamically free characters if they went blank. This gave the illusion of full screen functionality … as long as enough of the screen was blank. These routines were naturally less efficient than fixed bitmap routines.

@IsaacKuo quoted this, which reminded me I wanted to follow up on it — this is actually something that I stress regularly to my students about old minicomputers and their terminal designs. A PDP-11 with 256 kB of RAM often served several to a dozen or more users at interactive terminals, while early 8080 and 6502-based computers struggled to handle a single user and 80 column text with more RAM-per-user. However, they didn’t have the benefit of offloading 2 kB or more of video RAM plus all of the logic to handle the screen to an external device! The VT100 runs on an 8080, and the Televideo 910 is a 6502; both had a price in the ballpark of a low end home computer, and 2-4 kB of RAM plus another few kB of ROM. Dedicating the power of a home computer to each user to handle display updates was a Really Big Deal!

Not wanting to disagree with any points made, but when I commented on the 340 doing some of the work, I was thinking of the mechanics of updating a raster screen, such as most 8 bit micros will need to do: possibly computing an address, and most likely having to RMW to update individual pixels. Which is to say, every step along the journey from the core representation of the Life grid to the updated memory that the video comes from.

Of course, some machines make this easier, and some make it harder. And some Life grid representations are closer to the final raster, others further from it. It’s sometimes possible to use the screen memory as the Life grid directly.

I had a cursory look at the Type 340 manual [1]: It seems, the 340 accesses the host computer’s memory by DMA, freeing the CPU from issuing individual x/y display commands. The computer has to maintain a display list for this purpose and starts the process. There are two useful modes in the basic configuration, one for arbitrary x/y commands (were there’s always a mode command followed by two coordinates), one for a pseudo-bitmap mode (Increment Mode), listing display commands for any number of locations separated by 4 location each and starting from a given given x/y location. The latter one seems to coincide with the screen as shown in the video, but would require quite an amount of memory (a 20% frame buffer describing the grid). This way or the other, the program would have to maintain a display list, not dissimilar from addressing video memory in a micro computer style graphics mode.

[1] http://bitsavers.org/pdf/dec/graphics/H-340_Type_340_Precision_Incremental_CRT_System_Nov64.pdf

Edit, regarding “Increment Mode”: This one packs 4 adjacent grid locations into a single instruction word. The intensification can be switched on or off for an entire instruction word only (4 locations at once). Any of the 4 locations is described by 4 bits, declaring, if the cathode ray is to be moved horizontally or vertically and in which direction, respectively (move-x, right-left, move-y, up-down). If this is, what we’re seeing, I suppose the program would use only the first location (not applying any advance on the grid for the 3 other locations, as in “[intensify] [move right] [no-move] [no-move] [no-move]”), using the intensify bit as data for its own algorithm.
(Otherwise, there would have to be some JIT compiler assembling the display list on the fly for every frame, probably using x/y point mode, by this wasting quite an amount of memory and CPU time.)

Edit II: I guess, we get an idea, why Conway’s Game of Life wasn’t invented before 1970, as it requires quite an amount of memory (which wasn’t especially cheap), provided the display technology available.

You know, I never felt like the 6502 machines struggled to keep up. They generally had CPUs running at full speed or near full speed, with video access interleaved with CPU access. The result may actually have been an excess of speed, or at least that was the trend with the Commodore 8 bits. The C64 was actually slower than the VIC-20 due to the video chip stealing some cycles. The C128, in 1MHz mode, was even slower than the C64 due to the more complex interrupt routines (in 2MHz 80 column mode, the CPU was definitely running faster, but access to the video memory was slow… see below).

On the 8080 side of things, it was different. A lot of them used a system where video access shut out or even utilized the CPU, severely restricting effective CPU speed.

And then there were things like the IBM PC or C128 in 80 column mode, where video memory was separate and access to it was a bottleneck - not to mention, the extra buses and separate memory space made the hardware more expensive.

Mind that a 6502 as a secondary processor (tube processor) on the BBC Micro is running at 4 MHz, hooked up to a 4Mbit/s data bus. There’s quite an amount of reserve left in the usual 6502 architecture.

(PS: I understand that you were mainly addressing any bottlenecks caused by the usual implementation of memory and video memory implementation as commonly seen in other architectures.)

1 Like

I think what we’re seeing here is a similar set of tradeoffs being made by different designers. The CPU speed, the memory speed, and the graphics resolution are all interlinked if video and CPU accesses are to be interleaved.

In the case of Acorn’s Beeb, they used rather expensive and rare DRAM that they could drive at 4MHz, to support an 80 column mode with 16MHz pixel clock and also a 2MHz 6502. This was not cheap! Other 6502 machines generally had lower video resolutions or slower CPUs or would give video access priority. And often had smaller price tags.

In the case of Acorn’s Second Processors, the second machine is linked as a peripheral over a 2MHz bus, and the initial offering was a 3MHz 6502. Later offerings included a 4MHz 6502 and a 6MHz Z80. It’s then possible (and usual) to start to decompose a problem into compute work and display work. Both the famous Tube Elite and the recent Life programs try to get the most out of the 2MHz display procesor while simulating a world as fast as possible on the second processor.

3 Likes