by Alan Jones (alan.jones@qcs.org)Someone on comp.sys.cbm asked if the C64 could do HEAVY MATH, meaning solve computationally intensive numerical problems. The answer is of course, YES! This is the first of a series of articles on numerical computing for the C64/128.
The C64 is not the best computer for numerical work. However, it does quite well within its limitations of speed and memory. It is fine for most homework and hobby related problems, but not for big industrial problems. It does not bother me at all to let it crunch numbers while I watch a movie or sleep. Those old commercials about sending your children to college with a C64 were a joke. Still, it can save you a long walk to the campus on a miserable night. And you can always use it as a terminal to check jobs running on the mainframe.
The C64 is also a good computer for developing numerical algorithms and
programs. You can try new ideas and write programs at your leisure at
home with a C64. When developed to your satisfaction, algorithms and
programs can be "ported" to bigger and faster computers to solve larger
problems. The C64 has many programming languages available, although
many are not well suited for numerical development work. On larger
computers Fortran and C are popular for numerical work. On a C64, Power
C might be a good choice for some users. I use COMAL 2.0. I also have
COMAL programs that can help convert source codes from BASIC to COMAL,
and COMAL to Fortran.
Our C64 with its 6502 (6510) and 64K of RAM is a very simple machine.
It is so simple that many contemporary numerical programs are far from
ideal on a C64. So I will start with a bit of numerical computing
history. Early computers and the numerical algorithms that they used
are often closer to ideal for the C64 than contemporary PCs.
Researching old numerical algorithms can be useful for the C64; e.g.
Quartersolve in C-Hacking #10. Of course new algorithms are useful also
and sometimes you might want to combine ideas from both sides of the
spectrum.
In the beginning... were fingers. Seriously, "computer" was a human job description. These days, human computers are just an oddity seen on TV talk shows. The invention of logarithms was a big boon, and log tables and slide rules were just the start of computational aids. Eventually, mechanical adding machines were developed for high precision, error free (but slow) numerical work. One can still find large desk top Friden and Monroe mechanical adding machines. Numerical work was still a slow tedious process. More computing tools were developed. The Differential Analyzer was a mechanical computer that could solve IVPs (Initial Value Problems, integrating differential equations). There were also some early analog electronic computing aids. The first electronic analog computer was actually developed after electronic digital computers. (One could argue that many WW II autopilots and automatic control circuits were electronic analog computers.)
The first digital electronic computers were the ABC, ENIAC, EDVAC, and UNIBLAB. (UNIBLAB is just for the Jetson's fans. ;) ) John Vincent Atanasoff invented the first digital electronic computer at Iowa State University. (So if someone answers the phone and says, "He's on the John. Can he call you back later?" It might not be mean what you first think.) Clifford Berry, was a grad student and chief technician, hence the Atanasoff-Berry Computer, or ABC. The Atanasoff story is fascinating. See: The First Electronic Computer: The Atanasoff Story, Alice R. and Arthur W. Burks, The University of Michigan Press, 1988.
Atanasoff wanted to be able to solve large sets of linear equations.
Even with large mechanical adding machines, solving a 10 by 10 problem
was about the largest size that would be attempted. Schemes to connect
several mechanical adding machines were not feasible, and analog devices
were not precise enough. He was working at a small university and the
small grants available to him were a serious constraint. He developed
the ABC over a couple years for less than $7,000. The ENIAC would later
cost about $500,000! Atanasoff invented a way to use electronic vacuum
tubes as high speed digital switching devices. He then invented a
serial arithmetic logic unit, ALU. Vacuum tubes were still too
expensive so he used cheap capacitors for memory. He invented
additional circuitry to refresh the capacitors, i.e. dynamic RAM. He
designed a parallel computing machine that could add (and subtract,
shift, NOR,...) 30 50-bit binary numbers using 30 modular ALU units.
This allowed it to solve up to 29 linear equations with one right hand
side vector. The design could easily be scaled up in size and
precision. It used scratch paper for I/O and temporary memory. (Created
in man's image?) The card punch/reader was the limiting factor.
Mechanical punches, like (then) new accounting machines might use, were
too slow. An electronic spark punch was developed. A dielectric
material (paper) was placed between electrodes. A high electrical
voltage would carbonize a dot in the material and actually burn a small
pin hole. A smaller voltage would later test for the mark. This was
actually Berry's project. It had decimal to binary and binary to
decimal conversion for initial and final I/O, as well as other nice
touches.
Atanasoff also developed a variation of Gaussian elimination for solving the linear systems of equations with the ABC. The ABC, like our 6502, has no multiply instruction. The ABC had capacitor memory to hold two rows of equations. Multiplication was done with shifts and adds, but whole rows were computed in parallel. Fixed point binary arithmetic with truncation (no rounding) was used. However, it provided 50 binary bits of precision which was more than the adding machines provided. It used no division. The result would be printed (punched) out in decimal as two integers that would be divided on a mechanical desk calculator for each variable. His numerical algorithm may be useful for our 6502, although I'm sticking with the slower floating point arithmetic. It was not a general purpose "stored program" computer, but it could have been adapted to solve a variety of problems.
The ABC was completed and operational in April or May of 1942 except for
one problem: The card punch reading was not reliable. The problem may
have been the dielectric material or choice of paper. A 5 by 5 problem
could be reliably solved, but not the larger problems that it was
designed for. The problem could have been fixed. However, Atanasoff
and Berry were called to other WW II related work and not allowed to
perfect the ABC. The ABC was stored and later dismantled. Ironically,
the war that built the ENIAC killed the ABC. Of course many of John
Atanasoff's original inventions were later used in the ENIAC and EDVAC
computers.
The ABC was built into a desk sized wheeled cart and could be transported to a researcher's "home." It cost less than $7000, but additional units would have been cheaper. The ABC was akin to our favorite low cost home computer. By contrast, the second computer, ENIAC, cost a fortune, required a team of technicians to operate, and filled a large room. The ENIAC led to monolithic computing centers. It would be decades before the computer returned to the home.
I'll skip the better known history lessons: transistor > microprocessor > electronic hand calculators > home computers > C64 >... And of course the electronic computer caused an explosion in the development of mathematics and numerical algorithms.
Arithmetic is the basic building block of numerical algorithms. There are many types of numerical variables and arithmetics. Binary arithmetic is the most efficient for intensive numerical work. Decimal arithmetic is best for simple math where conversion to and from binary would just slow down entering and outputting numbers. Floating point arithmetic is easy to use because it is self scaling and covers a large dynamic range, but it tends to be slow. Fixed point, e.g. integer, arithmetic is fast but not as easy to use. Interval arithmetic involves computing not just a rounded result but an upper and lower bound on the result to cover the interval of the arguments and the accuracy of the computation. PGP encryption uses a high precision modular arithmetic. Complex, quaternian, and vector arithmetic can also be used.
The C64 built in BASIC provides 5 byte floating point variables and
arithmetic and 2 byte integer variables. I think integer arithmetic is
done by converting to floating point. Most of the programming languages
for the C64 use the same numerical variable types and even the same
arithmetic code. Even in assembly language we often call the same
floating point arithmetic routines. The +, -, *, and / arithmetic
operations on the C64 have no bugs. However, they appear to be coded
for minimum code size rather than minimum execution time. Every type
of computer arithmetic can be built up from the 6502 instruction set.
Some arithmetics can be coded for specific applications such as
Polygonamy in C-Hacking #12.
My interest is in using the floating point routines with numerical algorithms and writing programs. Of course even floating point arithmetic routines are built up from smaller arithmetic blocks. The key building block is the multiplication of two positive 8 bit values into a 16 bit result. Our 6502 has no such instruction.
The 6502 CPU was designed to be a low cost 8 bit CPU. It is fairly
cheap to interface to and will quickly access cheap "slow" memory. It
is also very quick and responsive to interrupts. It can perform 8 bit
binary and BCD addition with carry. The Z80 CPU was designed to be the
ultimate 8 bit CPU. It has several 8 bit internal registers which can
be used in 16 bit pairs. It has a full instruction set that includes
some nibble oriented instructions and a 16 bit add. On average a 1 Mhz
6502 is about as effective as a 2 Mhz Z80, and Z80s are generally
available in faster speeds. The C128 has a Z80 CPU that could be used
for numerical work, but it was poorly integrated into the C128 and
offers us no advantage over the 6502 (other than executing CP/M and
other Z80 code). Neither CPU has a multiply instruction. The fastest
way to multiply with a Z80 is with the simple binary shift and add
method. However, this is not true with the 6502! The fastest way to do
math on a 6502 is by using table look ups. This opens the door for
creative programming solutions.
Tables can use up a lot of memory, especially for a function of two or more arguments. An 8 bit multiply table could eat up 128K of memory. A 4 bit, or nybble, multiply table would only need 256 bytes, but this would involve so much additional work to realize the 8 bit multiply that it is hardly worthwhile. The C64/128 multiplies with the slow binary shift and add method. However, it is not so slow that we can use disk or REU memory to speed up such a simple function (a large bank switched ROM would be much faster). The table look up method can be readily used when multiplying by a constant, such as when calculating CRCs. Now consider the algebraic identity,
a*b = ((a + b)/2)_2 - ((a - b)/2)_2.With some more work we can do the multiplication using a table of squares of only about 512 bytes! (a + b) could overflow to nine bits, but we will immediately shift right one bit (the division by 2) so this is no problem. However, if (a + b) is odd the least significant bit is lost. This is easy to test for by doing a Roll Right instead of a shift and testing the carry bit. One way to compensate is to decrement a by 1 (a <> 0), multiply as above and add b, a*b = (a-1)*b + b. The decrement is free, but we pay for the extra add. Using 256K of external memory you could do a 16 bit multiply this way.
For an example of the shift and add type multiply and divide see, "High- Speed Integer Multiplies and Divides", Donald A. Branson, The Transactor, Vol. 8 , No. 1, July 1987, pp. 42-43, 45. Note also that although a*b = b*a, the ordering of the arguments can effect the multiplication speed depending on the bit patterns.
Perhaps a year ago there was a discussion running in comp.sys.cbm on ML routines to do fast multiplication. There was no clear best solution. Performance often depended on where the arguments a and b were and where the product was to be stored. This also affects how well these building blocks can be used to perform multi byte arithmetic.
Division is a more difficult problem. It can be done by shifting and
subtracting, table look up, and algorithms based on computing the
inverse. Consider: a/b = exp(log(a) - log(b)). With tables of the
logarithm and exponential functions (and you might want to use base 2)
we can do division with three table look ups and one subtraction. The
log and exp functions will have to be tabulated to a greater precision
than the arguments and result, or it will only produce an approximation.
In most cases we will still have to calculate the remainder using
multiplication and subtraction. Of course with log and exp tabulated we
can calculate fast approximations to many other functions, including
multiplication.
Stephen Judd used multiplication based on a table of squares and division based on a table of log and exp in Polygonamy in C-hacking #12. He reported that his 9 bit/8 bit divide takes 52 cycles "best case." However, where numerical algorithms are concerned, only worst case and average case performance are important.
Double precision, and multiple precision arithmetic routines should be coded efficiently in assembly language using the fast building blocks suggested above. However double precision FP variables and arithmetic can be built using pairs of ordinary FP variables and arithmetic. This will be slow but it can be effective when used sparingly such as when testing single precision algorithms or using iterative improvement techniques. See, "Double Precision Math", Alan Jones, Comal Today, Issue 20, Feb. 1988, pp. 18-20, and Comal Today, Issue 22, May 1988, pp. 58-61.
An algorithm is a procedure or set of instructions for computing something. I am mainly concerned with HEAVY MATH algorithms, but here I will present only feather weight numerical algorithms.
Consider the trivial algorithm,
repeat x := (x + 1/x)/2 until convergedThis is a stable quadratically convergent algorithm. For any initial x <> 0 it will converge to sign(x), i.e. +1 or -1. Pick a number, say 1.5 and take a few iterations. Note how fast it converges to 1.0. The error or distance from 1 keeps getting squared down toward zero. The number of correct digits in each iteration doubles. This is the quadratic convergence. Pick another number such as 10_20 and try again. At each iteration the error is cut in half. We take giant strides but convergence is still painfully slow. This is a linear convergence rate. This is a typical Newton's method algorithm. Near the solution, inside the region of quadratic convergence, convergence is very fast. Outside the region convergence is much slower. On more complex problems convergence may fail altogether or converge to an undesired point. In general an algorithm will converge to a "limit point" and if the algorithm is numerically stable, the limit point will be very close to the exact solution intended. Although it looks like this algorithm could run forever like an infinite series, in finite precision arithmetic it always converges in a finite number of iterations, even from the bad starting points. This algorithm is not so trivial when applied to a square matrix (with no eigenvalues on the imaginary axis). It will compute the matrix sign function which can be used to compute the stable invariant subspace, which can be used to solve the algebraic matrix Ricatti equation, which can solve two point boundary value problems, and be used to solve linear optimal control problems. Not to mention other pseudo random buzz mumble...
The inverse x = 1/b can be iteratively computed from x := x*(2 - b*x).
This is best used as a floating point, or multiple byte algorithm. This
is a quadratically convergent algorithm. This means that each iteration
should double the number of correct bits in x. You could use an 8 bit
multiply and converge to an 8 bit solution from an initial guess. A
better use would be to compute a 32 bit result (our floating point
mantissa). We might start with an 8 bit estimate from x := exp(-log(b))
using look up tables, take an iteration using 16 bit multiplication (or
16 by 8) to get a 16 bit estimate, and take another iteration using 32
bit multiplication to get the final 32 bit result. Division can then be
accomplished as a/b := a*(1/b). Of course this is only useful if you
have fast multiplication.
BASIC 2.0 calculates square roots from x = exp(0.5*log(a)). This is
slow since BASIC calculates the log and exp functions, and inaccurate as
well. If you have these functions tabulated you might want to use them
for an initial estimate of x. If you have a table of squares, the
inverse function of the square root, you could use a search routine on
the table. Square roots can be calculated iteratively from the Newton's
method algorithm,
x := (x + a/x)/2One can also compute x = 1/SQR(a) using
x := x*(3-a*x*x)/2avoiding the division.
E. J. Schmahl published ML code for computing the square root in "Faster
Square Root For The Commodore 64" in The Transactor, Vol. 8, No. 1, July
1987, pp. 34-35. This used a 16 byte look up table to start, followed
by Newton's method. He called the ROM FP routines to do the
calculations, but variable precision arithmetic could also be used as
suggested for the inverse algorithm.
Another interesting algorithm for the INTEGER square root was recently
published by Peter Heinrich, "Fast Integer Square Root", Dr. Dobb's
Journal, #246, April 1996. This is a fast algorithm that uses no
multiplication or division. It is not known yet if this is a good
algorithm for the 6502.
The AG Mean is our first real numerical algorithm, the others above are
our arithmetic building blocks.
Repeat a(i+1) := (a(i) + b(i))/2 b(i+1) := SQR(a(i)*b(i)) until convergedFor 0 < a(0) <= 1 and 0 < b(0) <= 1 the sequences converge quadratically to their common limit point, the AG mean of a(0), b(0). Note that we need to use full precision from the start and an accurate square root routine. The BASIC 2.0 SQR routine is not accurate enough. This can be used to compute the complete elliptic integral of the first kind, K(k). With a(0) = 0 ,and b(0) = SQR(1-k_2), K(k) = PI/(2*a(n)). The AG Mean can also be used for some other computations
Many mathematical equations can be found in math books and similar
sources. However, these are often in a form for ease of typesetting and
further algebraic manipulation. They should not generally be coded as
written. For example, the well known quadratic equation is the best way
to compute the roots of a second order polynomial equation. However,
there is a particular way to code it to avoid overflow, underflow, and
loss of precision. There are also analytical expressions for the roots
of third and fourth order polynomial equations. However, roots of third
and higher order polynomials are best solved for using general root
finding techniques.
This article is long on discussion and short on usable code. Although
it suggests faster ways of performing arithmetic on a C64, the built in
FP +, -, *, and / routines are reliable and can used for serious
computations. If I continue this series, I would want each article to
present source code for solving a numerically intensive problem. In Part
1, I present an introduction to Linear Programming. Hopefully other
topics will be suggested by readers, and possibly articles will even be
written by other users. Of course I could also write articles on
numerical methods, or turn this into a simple question and answer
column. I suspect many readers have already written many HEAVY MATH
C64/128 programs but have not shared them with the Commodore user
community yet.
Copyright © 1992 - 1997 Commodore Hacking
Commodore, CBM, its respective computer system names, and the CBM logo are either registered trademarks or trademarks of ESCOM GmbH or VISCorp in the United States and/or other countries. Neither ESCOM nor VISCorp endorse or are affiliated with Commodore Hacking.
Commodore Hacking is published by:
Brain Innovations, Inc.
Last Updated: 1997-03-11 by Jim Brain