Instructions that complete early

In MIPS, a floating point multiply completes early (2 cycles) if any of:

  • A source exception is detected
  • The product is zero or infinity
  • At least one operand is a power of 2

(See page 232 of the VR4300 user manual). Usually with 65x stuff we think of “cycle penalties” instead of finishing early, but MIPS does seem to think of it this way. Does anyone know of any other architectures where “easier” calculations finish early?

It kind of reminds me of multiplication/division algorithms found on computers which relied on shift steps. These algorithms have typically varying execution speeds (depending on what amount of normalization is required) and error conditions that can be detected early.

E.g., compare the hardware division of the DEC PDP-1, which processed two 18-bit fractional values and aborted early on overflow or infinity results. (Since both arguments and the result are fractional values with the decimal point just before the most significant bit, a successful division must satisfy the condition |divisor| > |dividend|, which can be checked early on, just after the normalization of the arguments.)
This instruction was in principle a hardware implementation of a division routine by BBN, which was based on discrete shift step instructions. You can find an annotated version of the source code here.

1 Like

ARM2 introduced multiplication instructions, and it can complete early if it runs out of work to do. Multiplying by a smaller number takes less time.

Rich Talbot-Watkins says on stardot:

Basically MUL can take up to 17 cycles to execute, depending on how big the second operand is. It keeps shifting out the bottom two bits each cycle until it’s zero. So, in your case, when that operand is 160, I would expect it to take five cycles to execute: 1 to fetch the opcode, and 4 to perform the multiplication.

2 Likes

Later x86 instruction timings vary extensively according to the values you stick into them. It’s true of pretty much all vaguely modern processors and is one of the fun ways you get timing analysis attacks.

Once you have caches and pipelines and the like instruction timings are no longer predictable so you might as well use the opportunity

The 286 I believe is entirely fixed clocks per instruction sequence but by the 386 not only are obvious things like BSF dependent on the input but so is stuff like IMUL

2 Likes

Lots of computers had and have instruction execution times that vary/varied according to the operands. The ENIAC, for example, could divide a 10-digit decimal number by another p-digit decimal number in (p+4) cycles. So the execution time varied from 5 cycles to 14 cycles. (Source: “The ENIAC: History, Operation and Reconstruction in VLSI”, Jan Van der Spiegel at al.).

1 Like