Quarter Square Multiplication - analog computing and 6502 MOS

Quarter Square Multiplication:
a*b = ((a+b)² -(a-b)²) / 4
Multiplication algorithm - Wikipedia

From the book:
Instrument Engineers’ Handbook, Volume Two: Process Control and Optimization
edited by Bela G. Liptak
Instrument Engineers’ Handbook, Volume Two: Process Control and Optimization - Google Books
Quarter Square Multiplication in Analog Computing:

The Quarter square multiplier technique has also benefitted 8-bit systems that do not have any support for a hardware multiplier. Charles Putney implemented this for the 6502
Fastest 6502 Multiplication Yet
Apple Assembly Line - V6N6 - Mar 1986 (txbobsc.com)
Charles Putney:
Charles Putney - Professional Profile - CodeProject


Nice connection back to analogue computing! I notice the book you linked is a 4th edition from 2005, but I can confirm (by google book search) that the quarter-square multiplier also appears in the index of the first edition, from 1969.

A 1k lookup table is fairly large for a 6502 system, but then the resultant multiplication is gratifyingly rapid!

1 Like

By coincidence I implemented that quarter square code a few months ago - also used another old coder’s trick for generating the table of squares. I was experimenting to see if I could speed up some of the arithmetic that the 6809 version of gcc generated. (I couldn’t but got quite close for some operations). Here’s a C version of quarter square plus some other junk: mul8.c ; a more polished version of 8-bit stuff for the vectrex: vectrex-muldiv.c (and a version that ran on the Vectrex and accurately timed execution to cycle precision using the VIA timer: cycle-counting.c ) - I’m not sure how useful any of that is but you might find it fun to try some of them if you still code for any 8-bit platforms…


This post inspired me to try to speed up my 32 bit multiplication in my retro Ruby 816 system (A 65c816 CPU with 512KB of RAM that runs BCPL)

It gave a marginal speed increase over my existing 32-bit shift and add code but not that significant. My system also has an ATmega “co-processor” which I was using to offload arithmetic operations to and it more or less matched that for speed, but added over 1KB to the '816 side.

Now, it’s entirely possible that my way to use it to implement a 32-bit multiply is not efficient, but it’s hard to tell.

In the process of testing it all, I made some posts over on the 6502.org forum and 3 others proposed faster solutions that didn’t use the quarter square method. One of these was actually faster and the code was much smaller too. The thread I started was:


However, what I’ve not done is benchmark the underlying 8x8 multiply against any other 8x8 multiply, so it may still be faster there, but it looks like when extending the operations to 32-bits then the gains may not always be as good as you might imagine…



Just to follow up my own post here, I decided to have a go at the “ultimate” 8x8 multiply on the 65816 CPU. … I created a table of all the products of 0…255 times 0…255. This results in a table of 128KBytes in size of the 16-bit products, or 64KB of the low byte and 64KB of the high byte. The '816 makes it relatively easy to index into these tables - which would be much harder on the 65C02.

My resulting code for a 32-bit multiply was about 2.5 times faster than the existing multiplys I tested. However the down-side is that 128KB of RAM needed.

Todays systems can do this in a single cycle, but back in the time of the '816 CPU - say mid 80’s, that was still a significant amount of RAM to dedicate just for a multiply - also at that time we were starting to see hardware assist appear in the form of “blitter” chips and so on.

It was an interesting diversion though.