HEAVY MATH - Part 0: History, Arithmetic, and Simple Algorithms

           by Alan Jones (alan.jones@qcs.org)

Someone on comp.sys.cbm asked if the C64 could do HEAVY MATH, meaning solve computationally intensive numerical problems. The answer is of course, YES! This is the first of a series of articles on numerical computing for the C64/128.

Introduction

The C64 is not the best computer for numerical work. However, it does quite well within its limitations of speed and memory. It is fine for most homework and hobby related problems, but not for big industrial problems. It does not bother me at all to let it crunch numbers while I watch a movie or sleep. Those old commercials about sending your children to college with a C64 were a joke. Still, it can save you a long walk to the campus on a miserable night. And you can always use it as a terminal to check jobs running on the mainframe.

The C64 is also a good computer for developing numerical algorithms and programs. You can try new ideas and write programs at your leisure at home with a C64. When developed to your satisfaction, algorithms and programs can be "ported" to bigger and faster computers to solve larger problems. The C64 has many programming languages available, although many are not well suited for numerical development work. On larger computers Fortran and C are popular for numerical work. On a C64, Power C might be a good choice for some users. I use COMAL 2.0. I also have COMAL programs that can help convert source codes from BASIC to COMAL, and COMAL to Fortran.

Our C64 with its 6502 (6510) and 64K of RAM is a very simple machine. It is so simple that many contemporary numerical programs are far from ideal on a C64. So I will start with a bit of numerical computing history. Early computers and the numerical algorithms that they used are often closer to ideal for the C64 than contemporary PCs. Researching old numerical algorithms can be useful for the C64; e.g. Quartersolve in C-Hacking #10. Of course new algorithms are useful also and sometimes you might want to combine ideas from both sides of the spectrum.

History

In the beginning... were fingers. Seriously, "computer" was a human job description. These days, human computers are just an oddity seen on TV talk shows. The invention of logarithms was a big boon, and log tables and slide rules were just the start of computational aids. Eventually, mechanical adding machines were developed for high precision, error free (but slow) numerical work. One can still find large desk top Friden and Monroe mechanical adding machines. Numerical work was still a slow tedious process. More computing tools were developed. The Differential Analyzer was a mechanical computer that could solve IVPs (Initial Value Problems, integrating differential equations). There were also some early analog electronic computing aids. The first electronic analog computer was actually developed after electronic digital computers. (One could argue that many WW II autopilots and automatic control circuits were electronic analog computers.)

The first digital electronic computers were the ABC, ENIAC, EDVAC, and UNIBLAB. (UNIBLAB is just for the Jetson's fans. ;) ) John Vincent Atanasoff invented the first digital electronic computer at Iowa State University. (So if someone answers the phone and says, "He's on the John. Can he call you back later?" It might not be mean what you first think.) Clifford Berry, was a grad student and chief technician, hence the Atanasoff-Berry Computer, or ABC. The Atanasoff story is fascinating. See: The First Electronic Computer: The Atanasoff Story, Alice R. and Arthur W. Burks, The University of Michigan Press, 1988.

Atanasoff wanted to be able to solve large sets of linear equations. Even with large mechanical adding machines, solving a 10 by 10 problem was about the largest size that would be attempted. Schemes to connect several mechanical adding machines were not feasible, and analog devices were not precise enough. He was working at a small university and the small grants available to him were a serious constraint. He developed the ABC over a couple years for less than $7,000. The ENIAC would later cost about $500,000! Atanasoff invented a way to use electronic vacuum tubes as high speed digital switching devices. He then invented a serial arithmetic logic unit, ALU. Vacuum tubes were still too expensive so he used cheap capacitors for memory. He invented additional circuitry to refresh the capacitors, i.e. dynamic RAM. He designed a parallel computing machine that could add (and subtract, shift, NOR,...) 30 50-bit binary numbers using 30 modular ALU units. This allowed it to solve up to 29 linear equations with one right hand side vector. The design could easily be scaled up in size and precision. It used scratch paper for I/O and temporary memory. (Created in man's image?) The card punch/reader was the limiting factor. Mechanical punches, like (then) new accounting machines might use, were too slow. An electronic spark punch was developed. A dielectric material (paper) was placed between electrodes. A high electrical voltage would carbonize a dot in the material and actually burn a small pin hole. A smaller voltage would later test for the mark. This was actually Berry's project. It had decimal to binary and binary to decimal conversion for initial and final I/O, as well as other nice touches.

Atanasoff also developed a variation of Gaussian elimination for solving the linear systems of equations with the ABC. The ABC, like our 6502, has no multiply instruction. The ABC had capacitor memory to hold two rows of equations. Multiplication was done with shifts and adds, but whole rows were computed in parallel. Fixed point binary arithmetic with truncation (no rounding) was used. However, it provided 50 binary bits of precision which was more than the adding machines provided. It used no division. The result would be printed (punched) out in decimal as two integers that would be divided on a mechanical desk calculator for each variable. His numerical algorithm may be useful for our 6502, although I'm sticking with the slower floating point arithmetic. It was not a general purpose "stored program" computer, but it could have been adapted to solve a variety of problems.

The ABC was completed and operational in April or May of 1942 except for one problem: The card punch reading was not reliable. The problem may have been the dielectric material or choice of paper. A 5 by 5 problem could be reliably solved, but not the larger problems that it was designed for. The problem could have been fixed. However, Atanasoff and Berry were called to other WW II related work and not allowed to perfect the ABC. The ABC was stored and later dismantled. Ironically, the war that built the ENIAC killed the ABC. Of course many of John Atanasoff's original inventions were later used in the ENIAC and EDVAC computers.

The ABC was built into a desk sized wheeled cart and could be transported to a researcher's "home." It cost less than $7000, but additional units would have been cheaper. The ABC was akin to our favorite low cost home computer. By contrast, the second computer, ENIAC, cost a fortune, required a team of technicians to operate, and filled a large room. The ENIAC led to monolithic computing centers. It would be decades before the computer returned to the home.

I'll skip the better known history lessons: transistor > microprocessor > electronic hand calculators > home computers > C64 >... And of course the electronic computer caused an explosion in the development of mathematics and numerical algorithms.

Arithmetic

Arithmetic is the basic building block of numerical algorithms. There are many types of numerical variables and arithmetics. Binary arithmetic is the most efficient for intensive numerical work. Decimal arithmetic is best for simple math where conversion to and from binary would just slow down entering and outputting numbers. Floating point arithmetic is easy to use because it is self scaling and covers a large dynamic range, but it tends to be slow. Fixed point, e.g. integer, arithmetic is fast but not as easy to use. Interval arithmetic involves computing not just a rounded result but an upper and lower bound on the result to cover the interval of the arguments and the accuracy of the computation. PGP encryption uses a high precision modular arithmetic. Complex, quaternian, and vector arithmetic can also be used.

The C64 built in BASIC provides 5 byte floating point variables and arithmetic and 2 byte integer variables. I think integer arithmetic is done by converting to floating point. Most of the programming languages for the C64 use the same numerical variable types and even the same arithmetic code. Even in assembly language we often call the same floating point arithmetic routines. The +, -, *, and / arithmetic operations on the C64 have no bugs. However, they appear to be coded for minimum code size rather than minimum execution time. Every type of computer arithmetic can be built up from the 6502 instruction set. Some arithmetics can be coded for specific applications such as Polygonamy in C-Hacking #12.

My interest is in using the floating point routines with numerical algorithms and writing programs. Of course even floating point arithmetic routines are built up from smaller arithmetic blocks. The key building block is the multiplication of two positive 8 bit values into a 16 bit result. Our 6502 has no such instruction.

The 6502 CPU was designed to be a low cost 8 bit CPU. It is fairly cheap to interface to and will quickly access cheap "slow" memory. It is also very quick and responsive to interrupts. It can perform 8 bit binary and BCD addition with carry. The Z80 CPU was designed to be the ultimate 8 bit CPU. It has several 8 bit internal registers which can be used in 16 bit pairs. It has a full instruction set that includes some nibble oriented instructions and a 16 bit add. On average a 1 Mhz 6502 is about as effective as a 2 Mhz Z80, and Z80s are generally available in faster speeds. The C128 has a Z80 CPU that could be used for numerical work, but it was poorly integrated into the C128 and offers us no advantage over the 6502 (other than executing CP/M and other Z80 code). Neither CPU has a multiply instruction. The fastest way to multiply with a Z80 is with the simple binary shift and add method. However, this is not true with the 6502! The fastest way to do math on a 6502 is by using table look ups. This opens the door for creative programming solutions.

Tables can use up a lot of memory, especially for a function of two or more arguments. An 8 bit multiply table could eat up 128K of memory. A 4 bit, or nybble, multiply table would only need 256 bytes, but this would involve so much additional work to realize the 8 bit multiply that it is hardly worthwhile. The C64/128 multiplies with the slow binary shift and add method. However, it is not so slow that we can use disk or REU memory to speed up such a simple function (a large bank switched ROM would be much faster). The table look up method can be readily used when multiplying by a constant, such as when calculating CRCs. Now consider the algebraic identity,

   a*b = ((a + b)/2)_2 - ((a - b)/2)_2.

With some more work we can do the multiplication using a table of squares of only about 512 bytes! (a + b) could overflow to nine bits, but we will immediately shift right one bit (the division by 2) so this is no problem. However, if (a + b) is odd the least significant bit is lost. This is easy to test for by doing a Roll Right instead of a shift and testing the carry bit. One way to compensate is to decrement a by 1 (a <> 0), multiply as above and add b, a*b = (a-1)*b + b. The decrement is free, but we pay for the extra add. Using 256K of external memory you could do a 16 bit multiply this way.

For an example of the shift and add type multiply and divide see, "High- Speed Integer Multiplies and Divides", Donald A. Branson, The Transactor, Vol. 8 , No. 1, July 1987, pp. 42-43, 45. Note also that although a*b = b*a, the ordering of the arguments can effect the multiplication speed depending on the bit patterns.

Perhaps a year ago there was a discussion running in comp.sys.cbm on ML routines to do fast multiplication. There was no clear best solution. Performance often depended on where the arguments a and b were and where the product was to be stored. This also affects how well these building blocks can be used to perform multi byte arithmetic.

Division is a more difficult problem. It can be done by shifting and subtracting, table look up, and algorithms based on computing the inverse. Consider: a/b = exp(log(a) - log(b)). With tables of the logarithm and exponential functions (and you might want to use base 2) we can do division with three table look ups and one subtraction. The log and exp functions will have to be tabulated to a greater precision than the arguments and result, or it will only produce an approximation. In most cases we will still have to calculate the remainder using multiplication and subtraction. Of course with log and exp tabulated we can calculate fast approximations to many other functions, including multiplication.

Stephen Judd used multiplication based on a table of squares and division based on a table of log and exp in Polygonamy in C-hacking #12. He reported that his 9 bit/8 bit divide takes 52 cycles "best case." However, where numerical algorithms are concerned, only worst case and average case performance are important.

Double precision, and multiple precision arithmetic routines should be coded efficiently in assembly language using the fast building blocks suggested above. However double precision FP variables and arithmetic can be built using pairs of ordinary FP variables and arithmetic. This will be slow but it can be effective when used sparingly such as when testing single precision algorithms or using iterative improvement techniques. See, "Double Precision Math", Alan Jones, Comal Today, Issue 20, Feb. 1988, pp. 18-20, and Comal Today, Issue 22, May 1988, pp. 58-61.

Numerical Algorithms

An algorithm is a procedure or set of instructions for computing something. I am mainly concerned with HEAVY MATH algorithms, but here I will present only feather weight numerical algorithms.

Consider the trivial algorithm,

   repeat 
      x := (x + 1/x)/2 
   until converged

This is a stable quadratically convergent algorithm. For any initial x <> 0 it will converge to sign(x), i.e. +1 or -1. Pick a number, say 1.5 and take a few iterations. Note how fast it converges to 1.0. The error or distance from 1 keeps getting squared down toward zero. The number of correct digits in each iteration doubles. This is the quadratic convergence. Pick another number such as 10_20 and try again. At each iteration the error is cut in half. We take giant strides but convergence is still painfully slow. This is a linear convergence rate. This is a typical Newton's method algorithm. Near the solution, inside the region of quadratic convergence, convergence is very fast. Outside the region convergence is much slower. On more complex problems convergence may fail altogether or converge to an undesired point. In general an algorithm will converge to a "limit point" and if the algorithm is numerically stable, the limit point will be very close to the exact solution intended. Although it looks like this algorithm could run forever like an infinite series, in finite precision arithmetic it always converges in a finite number of iterations, even from the bad starting points. This algorithm is not so trivial when applied to a square matrix (with no eigenvalues on the imaginary axis). It will compute the matrix sign function which can be used to compute the stable invariant subspace, which can be used to solve the algebraic matrix Ricatti equation, which can solve two point boundary value problems, and be used to solve linear optimal control problems. Not to mention other pseudo random buzz mumble...

Inverse and Division

The inverse x = 1/b can be iteratively computed from x := x*(2 - b*x). This is best used as a floating point, or multiple byte algorithm. This is a quadratically convergent algorithm. This means that each iteration should double the number of correct bits in x. You could use an 8 bit multiply and converge to an 8 bit solution from an initial guess. A better use would be to compute a 32 bit result (our floating point mantissa). We might start with an 8 bit estimate from x := exp(-log(b)) using look up tables, take an iteration using 16 bit multiplication (or 16 by 8) to get a 16 bit estimate, and take another iteration using 32 bit multiplication to get the final 32 bit result. Division can then be accomplished as a/b := a*(1/b). Of course this is only useful if you have fast multiplication.

Square Roots

BASIC 2.0 calculates square roots from x = exp(0.5*log(a)). This is slow since BASIC calculates the log and exp functions, and inaccurate as well. If you have these functions tabulated you might want to use them for an initial estimate of x. If you have a table of squares, the inverse function of the square root, you could use a search routine on the table. Square roots can be calculated iteratively from the Newton's method algorithm,

   x := (x + a/x)/2

One can also compute x = 1/SQR(a) using

   x := x*(3-a*x*x)/2

avoiding the division.

E. J. Schmahl published ML code for computing the square root in "Faster Square Root For The Commodore 64" in The Transactor, Vol. 8, No. 1, July 1987, pp. 34-35. This used a 16 byte look up table to start, followed by Newton's method. He called the ROM FP routines to do the calculations, but variable precision arithmetic could also be used as suggested for the inverse algorithm.

Another interesting algorithm for the INTEGER square root was recently published by Peter Heinrich, "Fast Integer Square Root", Dr. Dobb's Journal, #246, April 1996. This is a fast algorithm that uses no multiplication or division. It is not known yet if this is a good algorithm for the 6502.

Algebraic Geometric Mean

The AG Mean is our first real numerical algorithm, the others above are our arithmetic building blocks.

   Repeat 
      a(i+1) := (a(i) + b(i))/2 
      b(i+1) := SQR(a(i)*b(i)) 
   until converged

For 0 < a(0) <= 1 and 0 < b(0) <= 1 the sequences converge quadratically to their common limit point, the AG mean of a(0), b(0). Note that we need to use full precision from the start and an accurate square root routine. The BASIC 2.0 SQR routine is not accurate enough. This can be used to compute the complete elliptic integral of the first kind, K(k). With a(0) = 0 ,and b(0) = SQR(1-k_2), K(k) = PI/(2*a(n)). The AG Mean can also be used for some other computations

A Caution

Many mathematical equations can be found in math books and similar sources. However, these are often in a form for ease of typesetting and further algebraic manipulation. They should not generally be coded as written. For example, the well known quadratic equation is the best way to compute the roots of a second order polynomial equation. However, there is a particular way to code it to avoid overflow, underflow, and loss of precision. There are also analytical expressions for the roots of third and fourth order polynomial equations. However, roots of third and higher order polynomials are best solved for using general root finding techniques.

Conclusion

This article is long on discussion and short on usable code. Although it suggests faster ways of performing arithmetic on a C64, the built in FP +, -, *, and / routines are reliable and can used for serious computations. If I continue this series, I would want each article to present source code for solving a numerically intensive problem. In Part 1, I present an introduction to Linear Programming. Hopefully other topics will be suggested by readers, and possibly articles will even be written by other users. Of course I could also write articles on numerical methods, or turn this into a simple question and answer column. I suspect many readers have already written many HEAVY MATH C64/128 programs but have not shared them with the Commodore user community yet.

C= Hacking Home | Issue 13 Contents

Commodore, CBM, its respective computer system names, and the CBM logo are either registered trademarks or trademarks of ESCOM GmbH or VISCorp in the United States and/or other countries. Neither ESCOM nor VISCorp endorse or are affiliated with Commodore Hacking.

Commodore Hacking is published by:

Brain Innovations, Inc.
10710 Bruhn Avenue
Bennington, NE 68007

Last Updated: 1997-03-11 by Jim Brain