Is anyone aware of any microprocessors that implement the “double dabble” algorithm in hardware for converting binary unsigned integers to decimal?
I prefer to keep all my integers in binary and then only convert to decimal when I’m displaying them to the user. I don’t use the Z80’s DAA opcode to repair binary coded decimal sums and differences.
I tried once to do that in FPGA logic computer design, but I could never get a working design.
The 2903 bit slice may have that encoded in with the shift logic.
I think the reason we don’t see this very often in hardware is that to implement DAA, you just have to be able to do an addition on both nybbles and not much else. But even for an 8-bit int, double dabble requires several more conditional adds. That’s more transistors that have to be paid for.
My credit is good, add those transistors.
I always considered it as just text book hardware example like
adding two BCD digits. A software version could still be faster
on machine with no hardware divide in number conversion.
Could there be a better way to convert a binary number into printable decimal?
I presume this is for the Z80 - back in the early 80s (the last time I did 8080/z80 code!) I did it via repeated subtraction - had a table of 10,000, 1,000, 100, 10 and used them in a fairly brute-force way.
Another way - the way it’s done in my current 6502 project is by dividing by 10 and pushing the result into the stack then to print pop the stack and output the digits…
Being able to do it using less brute-foce methods would be nice though.
Repeated subtraction works fine but the performance can vary widely. Another way would be to test each binary bit individually and increment the corresponding decimal digits. Say like if bit 7 is set, then write 1, 2, 8 into the output buffer. Then if bit 6 is set, increment tens by 6 and ones by 4…
I have been studying Double Dabble for a long time.
All of that testing of and adding to individual BCD digits is merely a way to perform a “decimal adjustment” when the hardware does not directly support it.
Hi there. Yes, for that reason I do think we can use the z80’s DAA instruction to do a modified double dabble. Whereas usually in DD we would add 3 if the nybble is 5 or greater, DAA can add 6 if the nybble is 10 or greater. We would just be waiting for the nybble to overflow its decimal bound instead of jumping in early to prevent overflow.
What is your CPU? I cannot see why DD would not work.
This discussion made me go back and look at the code I uploaded again; I do not understand why I wrote that the 6809 cannot improve on the 6800 version.
A related question is historical: When did DAA largely disappear from CPU instruction sets? I’d fathom a guess that this was influenced by reductions in the price of memory (and Moore’s Law). If you have to squeeze your embedded application into a 1K ROM, the value of the DAA instruction shines. Once you had the luxury of including a binary-to-decimal routine in software, DAA shone less brightly. Then RISC came along…
I see the 68k and x86 both have decimal adjust - I think the logic to do that is a bit horrible on wider operands, needing another carry chain to get the most significant bits right. x86 doesn’t bother - only byte sized DAA.
I see also that Ken Shirriff wrote about the implementation of DAA in x86:
And the HN discussion does go into thoughts on RISC…
Hardware floating point chips also had a big impact. Why have your spead sheet grind away with
slow BCD math,hardware floating point is much faster. Any money lost with rounding errors helps support hungry programmers.
A quick look at the 8087 looks like it has 18 digit BCD math, so the DAA went there.
Several cpu chips do floating point emulation, but I don’t know if they did BCD math?
As side note, part of the reason I am building a 18 bit cpu, is to have 8 digit binary floating point.
Interesting! I see all operands are always converted to internal 80 bit floats before any work is done. Those 18 digit BCD values fit inside a 64 bit integer and therefore also in an 80 bit float. I wonder how many cycles it takes for the forward and reverse conversion.
BCD and fp are not replacements for each other. They overlap a bit, but BCD is more for financial math of finite variety, fp more for scientific/engineering.
So you want a business computer, Mr. Ben. May I show you … (gestures) … the IBM 1401! And here is, even better, the CADET: it doesn’t bother with math at all, it simply looks it up!
On a more serious note, I think an argument could be made that BCD tends to be slow, if the architecture isn’t optimized for this. I recall complaints about DEC BASIC being slow for doing math in BCD. (This even propagated to the soviet Elektronika PDP-on-a-chip computers.)
I agree. But wouldn’t you (ideally) need a variable length architecture to do BCD efficiently?