How is the eZ80 3x faster than the Z80?

The eZ80 operates at 3 times the speed of the Z80 at the same clock frequency. How is this possible? Is there a clock multiplier on die, or are the transistors so much faster that all the instructions complete in much fewer cycles?

eZ80 datasheet: http://www.zilog.com/docs/um0077.pdf

I imagine itā€™s a case of using cycles, rather than t-states. In other words, more like 6502, ARM, modern RISC, less like original z80, original 8-series.

But also, almost certainly more pipelining.

I would not expect it to be about having a clock multiplier.

From the datasheet:

ā€¢ Single-cycle fetch.
ā€¢ Pipelined fetch, decode, and execute.

They separate fetch and decode:
image

Itā€™s a classic three stage pipeline:
image

Thereā€™s a much more detailed timing diagram on the next page - itā€™s only page 5 of the datasheet, so donā€™t worry that itā€™s a 400 page document!

2 Likes

I do agree that the pipelining in Figure 2 does look like itā€™s making the ez80 go 3x faster, but donā€™t we also have to prove that the original z80 doesnā€™t do that to be sure thatā€™s where the speed difference comes from?

No, I think itā€™s just necessary to understand the nature of memory accesses. The 6-flavour of microprocessors perform one memory access per input clock cycle. The 8-flavour of microprocessors perform one memory access per approx four input clock cycles.

The performance of a microprocessor system very broadly tracks its memory bandwidth, and the speed and price of RAM chips in a given year (or month) dictates the memory access rate and, broadly, the system cost.

A 4MHz Z80 makes memory accesses at a 1MHz rate, more or less, same as a 1MHz 6502. The eZ80 does things the 6-flavoured way rather than the original 8-flavoured way. An eZ80 running at 4MHz would make memory accesses at a 4MHz rate, more or less.

(The reason Z80 systems have more performance, compared to 6502, than this memory access rate might suggest is twofold: more registers meaning fewer external accesses to solve a given problem; more complex instructions meaning fewer instruction fetches to get a given amount of work done.)

I donā€™t see ā€œpipelineā€ mentioned anywhere in the Z80 user manual, so combined with what you mentioned above thatā€™s enough to convince me the pipeline is where the 3x speedup came from.

It does seem like the ez80 is going to be important for the future of this architecture. If we need it to run at z80 speed for some reason, I guess a clock divider would be all you need. And if you donā€™t need any particular speed, why not use the faster ez80 with more capabilities?

Yes, I found the eZ80 rather nice - thereā€™s one in the Agon Light 2, and possibly in other modern retrocomputers.

I particularly like the way the backward compatible Z80 mode is handled - itā€™s much nicer and more flexible than the nearest idea in 6502 land, which is the 65816.

In fact I drafted some notes in April last yearā€¦ let me copy/paste from thereā€¦ this might be mildly incoherent. I notice that I make a very different statement there about the reasons for the relative performance improvement:

ā€¦very recently I bought a new modern retrocomputer, the Agon Light, which uses an eZ80 and adds 512k of RAM (and also has an ESP32 for graphics, sound, keyboard.) By default the Agon boots to BBC Basic for the Z80, and uses something rather like Acornā€™s bytestream VDU protocol for graphics. The BBC Basic is I think closely derived from Richard Russellā€™s Z80 BBC Basic from the 1980s, and as usual it includes an assembler.

So, I have a modern fast Z80 and Iā€™ve been reading up on it. It seems to me that Zilog have done a very good job of making a compatible 16/24 bit extension to an 8/16 bit original. ā€¦ Itā€™s quite a different approach from the '816.

Before looking into how itā€™s done, letā€™s see what we get:

  • an 8 bit wide memory system, with 24 bits of address.
  • a very compatible z80 mode, running in a 64k space.
  • several z80 applications can co-exist, isolated from one another, each in their own 64k section of the 24 bit space.
  • an extended mode (ADL mode), where all the 16 bit registers are 24 bits, and all instructions act accordingly, in a flat 24 bit address space.
  • a mixed mode which allows interrupts to run in the extended mode and return to a z80 application or to an extended mode application. It also allows extended routines to call Z80 routines, and vice versa.
  • a means for individual instructions to operate in a chosen mode, overriding the current mode

It seems to me to be very attractive.

Letā€™s look at some of the technicalities of the ez80 implementation:

  • instructions are, as before, variable length in bytes, and most take one clock cycle per memory access, running in a pipeline to hide some of the cost.

  • thereā€™s a single byte-sized register to supply the top 8 bits of 24 addresses when z80 mode supplies a 16 bit address. Itā€™s called MBASE, and the z80 mode canā€™t see or modify it.

  • the z80 mode canā€™t see or modify MBASE (this is safety and simplicity, not security!)

  • all the instruction encodings are the same, the action depending on the mode and/or the use of a single-shot prefix

  • just a handful of extra opcodes are needed, and theyā€™ve taken four NOPs from the z80 encoding for the purpose.

  • there are two mode bits, one to set extended addressing, and one to select mixed-mode operation

  • the high and low bytes of 16 bit registers are individually accessible as is usual for z80

  • in extended mode, itā€™s the middle and low bytes which are individually accessible: the high byte is involved whenever the whole 24 bit register is accessed.

  • the stack pointer is handled specially: thereā€™s a 16 bit stack pointer for z80 mode, and a separate 24 bit stack pointer for extended mode. The mixed mode deals with interrupts by using the 24 bit stack and pushing a mode byte so it can return to the caller appropriately. The ISRs themselves run in extended mode.

Notice that thereā€™s no wide memory or wide instruction decoding.

Note that the z80 has advantages in having many 16 bit registers including a 16 bit stack pointer: extending them to 24 bits while the 8 bit accumulator and register-halves remain at 8 bits seems quite natural. The trick of having a separate extended mode stack is simple and effective in allowing for mixed-mode interrupts. Somehow the mixed mode calls and returns work out too: I havenā€™t quite fully understood the call and return process as yet.

Postscript to my observations, made a little later:

I see a zilog doc specifically about the modes and mixed-mode operation which might help me understand more about that. Only 18 pages too.
http://www.zilog.com/docs/appnotes/an0339.pdf
(already, I see they say that the ā€˜four times fasterā€™ that they bandy about is not about the simpler clocking or the single cycle instructions, itā€™s about the improved efficiency when running in ADL mode. In other words, itā€™s approximate and code-dependent.)

Perhaps a good starting point for me would be to ask why we need the mixed-mode bit at all. We donā€™t need it for ordinary coding purposes, where code stays in one mode or the other. But, here it is: if we call an other-mode subroutine, we need it to stay in other-mode throughout, until the return, where we need it to revert to the calling mode. Clearly enough, this is a stackable kind of thing, because the call stack could be deep and keep switching modes. And we canā€™t stack this callee-mode-signifier on the 8-bit stack because thereā€™s not much room there and code wonā€™t be expecting it. So, for cross-calls and for interrupts, we need to stack the current mode and later restore it.

Having said all that, I still havenā€™t yet seen the need for a mixed-mode bit.

I realise thereā€™s probably a bit of a gap in my understanding so Iā€™m going to go back to the PDF.

The ez80 as a whole has an ADL mode bit and a MADL mode bit. But the four new prefixes temporarily set modes SIS, SIL, LIS, LIL, and I havenā€™t fully understood how they affect what happens. I think SIS and LIL represent the z80 mode and the ADL mode, but the other two are a bit mixedā€¦ not quite sure that thatā€™s the same as the MADL bit. It ought to beā€¦

I suspect the split stack isnā€™t an issue: you could almost think of it as a user mode stack for the z80 application and a supervisor mode stack for the ISRs. What you canā€™t do is run your ISRs in Z80 mode, I think, unless you have MADL=0 and are running pretty much as a Z80. Which you could do - the extra RAM would be accessible using prefixed instructions, but you probably wouldnā€™t run code outside the 64k you were in. But the more sophisticated thing to do is have your supervisor/MOS/ISRs in ADL mode, and your applications in either z80 mode for backward compatibility or in ADL mode for a big flat memory model.

ADL mode has a built in performance penalty because of the larger operands, but maybe gains a bit from the wider operations. As the accumulator remains 8 bit, you donā€™t get the wide machine you might hope for. And your previously 16 bit registers are now 24 bit, so you canā€™t easily do 16 bit or 32 bit datatypes. 48 bit long ints, anyone?

Hereā€™s a diagram from the datasheet:

An edge triggered flip-flop is very costly in terms of transistors, so most early processors used a pair of transparent latches operating on different clock phases instead. These two clock phases had to be non overlapping so the data could only move half a step at a time (think of the pair of gate in a lock that connects two bodies of water that are at different levels).

For the 6800, 6502 or 6809E an external circuit had to supply the two clock phases. That is why they were in the 1MHz range and memory operations took one clock cycle.

Intel and Zilog processors and the 68000 generated these phases internally by dividing a higher frequency external clock.

There were other clocking schemes, like the four phases for the TMS9900 or the CDP1802 that divided its incoming clock by 8. The various early National processors had odd clock schemes too, if I remember correctly.

As clock speeds went up, ensuring that the two phases remained non overlapping everywhere on the chip became more and more complicated. Eventually it just made sense to pay more transistors to have edge triggered flip-flops operating from a single clock phase. The is very likely to be the case for the eZ80.

1 Like