@(#): Polygonamy: A Study in 3 Dimensions
      by Stephen L. Judd (sjudd@nwu.edu)

We've been making some pretty small potatoes for a while now, so
the time has come for a little more ambition and challenge.  I decided to
think up a real challenge, containing problems that I had no idea how to
solve, and see what I could come up with.

I set out to create a 3D virtual world for the C64, e.g. a space
populated with various three-dimensional objects, which I could wander around
in.  I wanted it to be full-screen 320x200 hires bitmapped.  Furthermore, I
wanted the objects to be solid, and since there are only two colors I
wanted to be able to put patterns on the faces.  I also wanted it to
translate nicely to the 128's VDC chip, in 2MHz mode.  Finally, naturally, I
wanted the program to be fast.  This was the framework in which I placed
myself, and a few other ideas presented themselves along the way.  The
outcome of all of this is Polygonamy.

Just a brief history of this project: I have wanted to do a 3D
world for a very long time, and have been thinking about it for some
time in the back of my head; my imagination was probably first fired
the first time I played Elite.  I wrote down the necessary equations one
afternoon last summer, for a high school student I was teaching, and
the equations are very simple.  I took a break to get some work of
measureable value accomplished, but in October I began work on the graphics
algorithm.  I worked steadily on this for two months, and in December I
finally began to code the graphics.  In mid-January, I got them to work.
Adding the rest took a few weekends more.  I have about 128 pages of notes,
analytical calculations, and BASIC test programs in my C64 notebook (which
is, I think, a nice number of pages to have :).  My original plans were
to place five objects in the world, but time and memory constraints whittled
that down to three.  One of my disks self-destructed about the day I was
ready to finish, so I had to reconstruct a bunch of tables, but other than
that I finally managed to finish it up, albeit with a few rough edges ;).

Although the concepts from previous articles are used as a solid
foundation, the code is almost 100% from scratch -- I only borrowed a
tiny piece of code from an earlier program, and even that I modified
somewhat.

One caveat before we begin: I am primarily interested in the
challenge of designing the algorithms, which means I like to come up
with my own solutions.  Thus, you may find more efficient methods
in a graphics book or perhaps in someone else's code; I have examined
neither, so I have no idea what the relative merit of my algorithms
may be.  These are simply my solutions to challenges placed before me.
And if you know of a better way to do things, please feel free to email
me!

Furthermore, I consider the code a test of the theory.  Some of my
assumptions work and some do not, and these will be considered at the end
of this article.

Finally, I am not including the source code.  For one thing, it is big.  
Like, _HUGE_, man.  I had to split it up when I ran out of editor memory
on my 128 (which, incidentally, forced me to figure out Merlin 128's very
cool and very powerful linker feature).  I will include numerous code 
fragments in assembly and BASIC which demonstrate all the important 
concepts.

By the way, if you are interested in measuring frame rates, you can
use the first object.  Every full 360 degree revolution is 128 frames. 
So time how long it takes to complete a full rev (or maybe several), and 
divide that number into 128, to get an idea of frames per second.
For a rundown of frame rates for stock and SuperCPU operation, see
"Underneath the Hood of the SuperCPU" (Reference: cmdcpu) found
elsewhere in thi issue.

Some brief acknowledgements: This project would not have happened without
the extremely powerful macro and linking capabilities of the Merlin 128
assembler, by Glen Bredon.  It would have been _really_ tough without
JiffyDOS and my FD-2000, from CMD.  I used my Action Replay extensively
for debugging, and without the excellent documentation for the 64, such as
the PRG and Mapping the 64, this would have been a nightmare.  Finally, I 
must acknowledge my friend George Taylor; a few days before I was all 
finished I explained some routines to him, and he made a great suggestion 
which made my fast fill routine blaze.

Okay, WAY too much talk.  There are a ton of issues involved with this 
project so let's just wade in hip-deep and deal with them as they come.

@(A): The Equations
      -------------

First some relaxing abstraction.  In previous articles we have discussed
how to project an object in three dimensions through the origin into a 
plane.  We have also discussed rotations in three dimensions.  In 
principle, then, we have all the mathematics we need to do a 3D world.

But we should be thoughtful.  Let's say we're standing in the world and 
turn to the right; we can either rotate ourselves, and change the 
direction we are looking, or we can rotate the world around us, so that 
we are always looking 'forward'.  This may bother you on physical grounds,
but the two are mathematically equivalent.  Given the way we have derived
our projection routines, it should be clear that we want to rotate the 
objects in the world around us.

(Or, to put it another way, we are at the center of the world, and the 
world revolves around us.)

We have another issue: how do we know when an object is visible or not?
How do we know when we've bumped into an object (or blown it out of the
sky :)?  Moreover, if we have ten objects, and each object has six points,
it would be a real drag to have to rotate all sixty points, especially
if none of the objects were even visible.      

It should be clear that we really want to define every object relative
to some center of the object.  So we keep track of the center of each 
object, and rotate and translate the centers, and only calculate the 
full object if it is visible.  We of course want to define the object 
relative to this center.

What happens to this center when we translate or rotate the world?

Let's simplify our model a little bit and only deal with rotations about 
one axis, e.g. we are driving a tank and can move forwards and backwards,
and turn left or right.  The generalization to other axes is very 
straightforward, but this way we can think in two dimensions instead 
of three.

First we need to agree on a coordinate system.  For my system I let the 
x-axis go _up_ a page of paper, the y-axis comes up out of the page, and 
the z-axis goes from left to right.  Thus, I am standing on the paper in
the x-z plane, at the origin, with the y-axis extending upwards from me. 
If you still don't understand my orientation, draw a little picture.

I am going to choose my orientation so that I am always looking straight
down the x-axis, e.g. in the direction that x is increasing.  Thus, if I
walk forwards or backwards, this corresponds to decreasing or increasing
the x-component of the center coordinate:

   let C=(cx,cy,cz)
   move forwards => cx=cx-1
   move backwards=> cx=cx+1

So far so good.  As always, draw a picture if you can't visualize it.  
That takes care of translations, what about rotations?

We certainly know how to rotate points about the origin.  In particular,
if we have a point with coordinates (x1,z1) and rotate it clockwise by 
an angle s, we get the new point as follows:   

   (x1,z1) -> (x1*cos(s)+z1*sin(s), z1*cos(s)-x1*sin(s))

So that's easy enough.  The problem is that we have this big object
sitting around this point, and need to figure out what to do with it!

Consider the following: let's say we have a line out some distance from 
the origin,

                         X
                         X
            |            X
 z-axis ----O------------c----     c=center
            |            X
            |            X
          Origin         X <---- line

and we rotate it by some amount theta about the origin:

                      X
                      XX
                       Xc      c=rotated center
            |           XX 
 z-axis  ---O------------XX-
            |             X
                          XX

You can see (from my incredible ASCII artwork) that the line is now at an
angle with respect to the origin.

Imagine that we draw a line from the origin to the center of the point, 
in the first picture (or get a pencil and paper and actually do it), so 
that we have the letter "T" laying on its side.  Now we rotate this "T" 
by some angle theta, so that the top of the "T" -- our line -- has now
been rotated.

The stem of the "T" meets the top of the "T" at the center point c.  Drop
a line from the rotated center straight down to the z-axis, and call this
line l2.  Since the T is at a right angle, and we have rotated it by an
angle theta, the angle between our line and the z-axis is 90-theta.  But
this means that the angle between our line (the top of the "T") and the
line l2 is just theta.

Thus, if we rotate the center by an amount theta, all we have to do 
is rotate the object by an amount theta *about the center*, and our 
perspectives will all be correct.  How is that for simple?  It should
be clear now that this works no matter where the "center" of the object
is chosen.  Thus, our center is not some physical center of the object,
but rather the center of rotation of the object.

Since this is true of rotations about any axis, we now know how to 
generalize to higher dimensions.

Note further that we can now implement local rotations -- that is, let's 
say the object is a tank, and this tank turns independently of whether 
or not we turn.  Piece of cake.

You can also see that the rotations are cumulative.  If we turn to the 
left, and then turn left again, we can simply apply two rotations to the 
points of the object.  In fact, if we turn left, move forwards, and then
move left again, we still apply just two rotations to the points of the
object; the center, however, has moved.

This is quite important, as it allows us to measure an object's relative
rotation to us, whether or not it is visible.  Remember that we only 
want to rotate the points that define an object when the object is 
visible.  We never actually change the points which define an object.
Instead, we track how much the object needs to rotate, and rotate the
original points by that amount.

The center of the object will change with each rotation and translation.
We never change how we define an object about this center, though.  We
simply apply a rotation to the original points when appropriate.  The 
object centers must be kept track of because they can undergo translation
as well as rotation.

To summarize then: we define each object relative to a center of 
rotation.  The center determines where the object is located in the
world, and allows us to operate on centers when we need to rotate or
translate the world.  It also lets us perform local operations on an
object, so that we could, for instance, have a spinning cube located
inside our world.  If an object's center is visible then we can consider
the object to be visible, and plot it.

Whoops -- what does it mean to be 'visible'?  Well, think about yourself.
You can see things when they are in front of you.  Or, to be more 
precise, you have a field of vision.  Perhaps a decent model of this 
would be a cone.  I think a better model, certainly one which is
easier to deal with computationally, is the intersection of two planes:
a pyramid.  Anything which lies within this pyramid we consider visible,
and anything which lies outside we consider not visible.

Two-dimensionally, we draw a little wedge extending from the origin.  
Anything within the wedge we count as visible, and anything outside
of it we count as not visible.  The sides of this wedge are two lines, 
with equal but opposite slope (i.e. slope=+/-m).
    
 \ Visible /
  \  View /  Outside of visual area
   \Area /
    \   /
     \ /
      * <--- Me

Probably lines at some angle are more reasonable than others.  But I'm a
simple guy, and the two simplest lines I can draw are at 45 degree angles
from the axis, so their slope is +/-1.  Thus, any points which lie 
between the lines x+z=0 and x-z=0 are visible.  If the center of an 
object is within this area, we will consider the object visible.  That 
is, if cx+cz>0 and cx-cz>0 the object is visible.        

One last thing: if we are too close to the object, we either want to 
bump into it (i.e. not move) or else not display it.  So we also need to 
check if cx<x0 for some x0.

We are now in a position to write some simple code.  I wrote the
following in evil Microsoft QBasic, but BASIC7.0 on the 128 would work
just as well, although you need to change the variables (I didn't have my
128 handy, otherwise I would have written this on the 128):

@(A): Polygon Prototype Code

   SCREEN 1 '320x200

   delta= 3    'Rotations will be in 3 degree increments
   rad= 3.1415926/180
   cdel= COS(rad*delta)
   sdel= SIN(rad*delta)
   theta=0
   d=-135
   x0= 60      'Bumped into the object?
   REM z0 y0 would be 160,100 to place the object in the center
   REM of the screen
   y0= 170  'I want the bottom of screen to be ground
   z0= 160

   REM Set up the object
   REM Tetrahedron: 0,sqrt(3),0  0,0,1  0,0,-1  2,0,0
   DIM obx(4), oby(4), obz(4)

   obx(1)= 0
   oby(1)= 50*SQR(3)
   obz(1)= 0
   obx(2)= 0
   oby(2)= 0
   obz(2)= 50
   obx(3)= 0
   oby(3)= 0
   obz(3)= -50
   obx(4)= 100
   oby(4)= 0
   obz(4)= 0

   cx= 100
   cy= -10
   cz= 0

   REM Get input
   main:

   DO
     a$=INKEY$
   LOOP UNTIL a$<>""

   IF a$="[" THEN cx=cx-20
   IF a$="/" THEN cx=cx+20
   IF a$=";" THEN GOSUB rotl
   IF a$="'" THEN GOSUB rotr

   IF cx<x0 THEN CLS: GOTO main:
   IF cx<cz OR cx+cz<0 THEN CLS: GOTO main:

   ctheta= COS(rad*theta)
   stheta= SIN(rad*theta)

   p1x= cx + ctheta*obx(1) + stheta*obz(1)   'Rotate and add to center
   p1y= cy + oby(1)
   p1z= cz + ctheta*obz(1) - stheta*obx(1)
   p1y= y0 + d*p1y/p1x          'Project and add offset
   p1z= z0 + d*p1z/p1x
   [... similar for p2x,p2y,p2z,...,p4x,p4y,p4z]

   CLS
   LINE (p1z, p1y)-(p2z, p2y)
   [... lines between p2-p3, p3-p1, p1-p4, p4-p2, p4-p3]

   GOTO main:    'Main loop

   REM rotate left
   rotl:
       theta= theta + delta
       blah= cdel*cx + sdel*cz
       cz= -sdel*cx + cdel*cz
       cx= blah
   RETURN

   rotr:
       theta= theta-delta
       blah= cdel*cx - sdel*cz
       cz= sdel*cx + cdel*cz
       cx= blah
   RETURN

(You may note that cx=cx+20 is used for a translation, instead of cx=cx+1.
This will be detailed later).

So much for the easy part.

@(A): Filling
      -------

If there is one thing that the previous programs have taught us, it is 
that graphics are slow.  At least, they are far and above the major thing
slowing down the program, and deserve the most attention and thought for
improvement.  Moreover, because there is lots of looping involved, the
elimination of just a few instructions can translate to thousands of 
cycles saved.
      
We have examined several fill routines up to now, but neither of them is
up to the task of Polygonamy.  The cookie-cutter method is OK, but doesn't
allow multiple objects, and certainly doesn't allow pattern fills.  Using
an EOR-buffer is just plain slow and inefficient and a big drag.  So it's
time to rethink this problem.
    
Recall that on the 64 the bitmap screen is divided into 8x8 cells, which
are arranged horizontally in rows.  It's a pretty kooky way of doing 
things, but we shall overcome.

First of all it should be clear that we want to fill from left to right
(as opposed from top to bottom).  We can then fill a byte at a time, 
instead of dinking in little pixels at a time.

Previously we used a custom character set to plot into.  One of the major
reasons for doing so was to use Y as the Y-coordinate, so that storing a
particular point was as simple as STA COLUMN,Y.  We can still use this
idea, but only within each row.  That is, if we let Y=0..7, we can address
each individual pixel-row within each 8x8 block row with an STA ADDRESS,Y.

For real speed, we are going to want an unrolled fill routine.  That is,
we don't want to mess around with loop counters and updating pointers and
such.  Since there are 25 rows on the screen (25 times 8 = 200 pixels
high) we are probably going to need 25 separate fill routines.      
     
I constructed my fill routine as follows:

            STA COL1,Y
            DEX
            BEQ :DONE
            STA COL2,Y
            DEX
            BEQ :DONE
            STA COL3,Y
            ... etc.
   :DONE    RTS
   
Thus X would be my counter into the number of columns to fill, A can
contain our pattern to fill with, and Y can range from 0..7 to index the
individual rows within the block.  The first thing to notice is that each
STA/DEX/BEQ code chunk is six bytes.  So, all we need to do is calculate
which row to start filling at, multiply by six, and add that number to
the start of the fill routine.  The idea is then to jump into the correct
place in the fill, and let it fill the right number of columns, stored in
X.

There is a little problem though -- what we're talking about doing is an 
indirect JSR, and there is no such thing.  But it's easy enough to fake,
because we can use an indirect JMP.  So a call to the fill routine would
look like the following:

         ...
         JSR FILL
         ...
FILL     JMP (ADDRESS)

where ADDRESS simply points to the correct entry point in the fill routine.

Moreover, you may also note that 40 columns times 6 bytes/column is 240
bytes, so that each little fill routine handily fits in a page.  Thus,
moving between rows in the bitmap corresponds to a simple decrement or
increment of the high byte of the ADDRESS pointer.  

This was the state of things when, days before I was to be all done with
Polygonamy, I mentioned it to my friend George Taylor, who suggested the
following modification: instead of using X to count how many columns to
fill, just make the fill routine:

         STA COL1,Y
         STA COL2,Y
         STA COL3,Y
         ...

Then, insert an RTS into the right place in the routine.  Thus, we 
calculate which column to stop filling at, multiply by three, and stick
an RTS in the fill routine at that point.  To fix it up we stick an
STA ..,Y back on top of the RTS.
        
I don't think you're going to make a fill routine faster than that :).

Moreover, note that each fill routine takes up just 120 bytes, so we can
now fit two fill routines in each page.  I did not do this, but it is
easy to do, and instantly frees up 25 pages.           

@(A): Filled Polygons
      ---------------
      
I mean, hey, this _is_ "Polygonamy", so let's talk polygons, and lots of
them.

Clearly all that is needed to draw an object are the left and right
endpoints of the object, since everything in-between will be filled.

An observation to make is that if you take a slice out of a convex
polygon, the slice will intersect the polygon at exactly two points.
Another, more important, observation is to note that the highest and
lowest point of a polygon will always be at a vertex.  Finally, it is
important to note that any vertex of a polygon has exactly two lines
extending out of it, one to the left, and one to the right.

Consider a piece of a polygon:

  \         /
   \       /
    \     /
     \   /
      \ /
       *  <--- Vertex v0

where the vertex v0 is the lowest point of the polygon.  All that needs
to be done is to move upwards (DEY), compute the left and right points
of the polygon at that point, and then fill between the two (JSR FILL).
      
The idea then is to start at the bottom (or the top) and to steadily move
upwards while _simultaneously_ calculating the endpoints of the left and
right lines, and filling in-between.  But we need the equations of the
left and right lines to do this.
    
Now it's time for another observation.  Let's say we have a polygon with
n vertices v1, v2, ..., vn, and furthermore that as we move between these
points we move around the polygon counter-clockwise.  Thus v3 is to the
right of v2, v1 is to the left of v2, v4 is to the right of v3, etc.
For example:

      v1____v3
        \  /
         \/
         v2

What happens if we rotate this polygon?

      v2____v1
        \  /
         \/
         v3

The vertices have changed position, but *their order has not*.  v3 is
still to the right of v2, and v1 is still to the left.

Now we have a real plan.  We simply define the polygon as a list of
points v1 v2 v3 ... vn.  We then figure out which one is lowest, i.e. has
the smallest (or greatest) y-coordinate, call this vertex vm (vmax).  The
endpoints of the left and right lines are vm-1,vm and vm,vm+1.  So move
along those lines until the next vertex is reached.  At that point,
recompute the appropriate line, and keep moving upwards until the top of
the polygon is reached.

Perhaps an example would be helpful:

        v1
        |\
        | \ v3
        | /
        |/
        v2

v2 is the minimum.  The left line has endpoints (v1,v2) and the right
line has endpoints (v2,v3).  We steadily move along the left and right
lines as we creep upwards.  At some point we hit v3, and at this point
we compute a new equation for the right line, this time with endpoints
(v3,v1).  Now we continue to creep upwards and move along the left and
right lines, until we hit v1, at which point we are finished.

It is important to keep in mind that the order of the points never
changes.  We don't need to do anything complicated like sorting the 
points; we only need to find the lowest point, and branch left and right
from there, keeping in mind that the points are cyclic (i.e. v1 is to the
right of vn).

It is now time to start thinking about code.  One aspect of the fill 
routine we haven't considered is the clear.  In the past the entire draw 
buffer was cleared and then the new stuff was drawn into it.  But this 
seems like a bit of a waste; it seems wasteful to clear a bunch of memory 
that is just going to be overwritten again.  So, as long as we can do it
efficiently, it might be smart to combine the clear and fill routines.

Here is how Polygonamy does it: If a line needs to be cleared, then it
is cleared up to the edges of the object, but the part that is going to
be filled is ignored.  (It isn't clear if this provides any substantial 
efficiency gains, though).

To see the status of a particular line, a table is used, containing a 
value for each Y-coordinate.  If the entry is 255 then the line is clear,
if it's 0 then the line has old junk in it, and if it's 1 then the line
has new junk in it.  Thus we only clear the line if its entry in the fill
table is a zero.

So a fill routine might flow like the following:

   let Y count from 7..0

   If we are at the left endpoint then recalculate the left line.
   If we are at the right endpoint the recalculate the right line.
   Update xleft & xright
   If line needs to be cleared then clear line.
   If the starting fill column is different than the previous fill
       column then update the pointers, etc.
   Plot the left and right endpoints (since the fill routine only plots
       eight bits at a time)
   Fill the in-between parts
   Update Y
   If Y passes through zero then update fill entry point, set Y=7 etc.
   Keep going until we reach the top

The next thing to figure out is how to calculate the left and right
lines.  We do have the old line routine, which we could use to advance
to the left and right endpoints, but clearly this isn't too efficient.

The question is: if the y-coordinate increases by one, then how much does
the x-coordinate increase by?  The equation of a line is:

   (y-y0) = m*(x-x0)  m=slope
   
or

   change in y = m*change in x

So, if the change in y is 1, then then the change in x is 1/m.  All we
need to do then is calculate the inverse-slope=dx/dy, where dx=x2-x1 and
dy=y2-y1, and add this to the x-coordinate with each step in y.

Isn't this a fraction?  Sure, big deal.  The fraction can be written
as dx/dy = N + Rem/dy, where N is an integer, and Rem is the remainder,
which is always less than dy.  So to calculate x=x+dx/dy:

   x= x+N
   xrem= xrem+Rem
   If xrem>=dy then x=x+1:xrem=xrem-dy

As usual, we want to start xrem at dy/2, which has the effect of rounding
numbers up.

   10 REM LINE ROUTINE TAKE TWO SLJ 11/24/95
   12 REM ACTUALLY IT'S A FILL RTY NOW
   15 GRAPHIC 1,1
   20 X0=160:Y0=100
   30 X1=5:Y1=-50:X2=7:Y2=11:XL=X1:YL=Y1:Y=Y1
   35 X3=50:Y3=Y1:X4=X3+100:Y4=Y2:XR=X3:YR=Y3
   40 D1=Y2-Y1+1:DX=X2-X1:LI=INT(DX/D1):LR=DX-LI*D1
   45 TL=INT(D1/2)
   46 D2=Y4-Y3+1:DX=X4-X3:RI=INT(DX/D2):RR=DX-RI*D2
   48 TR=INT(D2/2)
   50 DRAW1, X0+X1,Y0-Y1 TO X0+X2,Y0-Y2, X0+X3,Y0-Y3 TO X0+X4,Y0-Y4
   60 REM MAIN LOOP
   70 XR=XR+RI:TR=TR+RR:IF TR>=D2 THEN TR=TR-D2:XR=XR+1
   75 DRAW1, X0+XL,Y0-Y TO X0+XR,Y0-Y
   80 XL=XL+LI:TL=TL+LR:IF TL>=D1 THEN TL=TL-D1:XL=XL+1
   90 REM DRAW1, X0+XL,Y0-Y TO X0+XR,T0-T
   100 Y=Y+1:IF Y<=Y2 THEN 60

In this program (x1,y1)-(x2,y2) is the left line, and (x3,y3)-(x4,y4) is
the right line.  The first thing to note is that in lines 40 and 46, 
Y=y2-y1+1.  This issue was discussed in the very first C=Hacking 
3D-graphics article.  The problem is that although the line will be 
anatomically correct with DY=y2-y1, it will look silly.  The easiest way
to see this is to consider y2-y1=1, e.g. say we draw a line between
(0,10) to (50,11).  Ideally this line will consist of two line segments,
one from (0,10) to (25,10) and the other from (26,11) to (50,11).  But ifi
we use DY=1 we will have one line segment from (0,10) to (50,10), and a
single point at (50,11).

Adding one to DY is just a simple cheat.  Most of the time the lines will
look just fine, but lines which have a slope near one will come out a bit
wrong.  The other, accurate solution, which was used in the first
article, is more complicated to implement in this routine.  Adding one to
DY will also have a useful benefit which we shall shortly see.   

In line 50 above the boundaries of our object are drawn in, to check the
accuracy of the algorithm.

In lines 70-80 the right point is updated, then the thing is filled, then
the left point is updated.  This is because both lines are moving to the
right, e.g. they both have positive slope.  Think about how the line
segments will be drawn; in general, we want to draw from the left end of
the left line segment to the right end of the right line segment.  
(Sometimes this will look a little off where the two lines meet).

Since the left and right lines can each have either positive or negative
slopes, there are four possibilities: Plus-Plus, Plus-Minus, Minus-Minus,
and Minus-Plus.

   Plus-Plus:  Update right, fill, then update left
   Plus-Minus: Fill, update left, update right
   Minus-Minus:Update left, fill, update right
   Minus-Plus: Update left, update right, fill
   
If this is still confusing, try out the above program with various left
and right line segments, and these things will jump right out.

Now we need to think about implementing this in assembly.  Since this is
being done in hires 320x200, the x-coordinate requires two bytes, and the
y-coordinate requires one.  We also need another byte to store the
remainder portion of the x-coordinate.

The most glaring question is the calculation of dx/dy: somehow we need a
fast way of exactly dividing an eight bit number dy into a nine bit
number dx.  Recall that we always add one to dy, so that dy actually
ranges from 2 to 200.  Since the maximum value of dx is 320 or so, the
largest value of dx/dy that we can have is 320/2 = 160.  In other words,
both the integer and remainder part of dx/dy will fit in a byte.  Simply
adding one to dy makes life pretty easy at this end.

One very fast method of doing division is of course by using logarithms.
But they have a problem with accuracy.  One the other hand, one thing we
know how to do very quickly is multiplication.

This then is the plan: use logarithmic division to get an estimate for N,
the integer part.  Then calculate N*dy, compare with dx, and adjust the
integer part accordingly.                             

A quick reminder of how logarithms can be used for division:

   log(a/b) = log(a) - log(b)
   exp(log(x)) = x

thus

   a/b = exp(log(a)-log(b)).

How do we take the log of a 9-bit number?  We don't.  Instead, we
construct a table of f(x)=log(2*x), and use, not x, but x/2, as a
parameter.  Remember that the logarithms merely give an estimate to the
integer part.

Moreover, if the tables are constructed carefully we can insure that the
estimate for N is either exact or too small.  Thus we only need to check
for undershoots, which simplifies the calculation considerably.  In
particular, the tables were constructed as follows:

   10 DIM L1%(160), L2%(200), EX%(255): C=255/LOG(160)
   20 FOR I=1 TO 160
   30 L1%(I)=INT(C*LOG(I))
   40 NEXT
   50 FOR I=2 TO 200
   60 L2%(I)=INT(C*LOG(I/2)+0.5)
   70 NEXT
   80 FOR I=0 TO 255
   90 EX%(I)=INT(EXP(I/C))
   95 IF(I=129)OR(I=148)OR(I=153)OR(I=81)OR(I=98)THEN EX%(I)= EX%(I)-1
   100 NEXT
   110 L2%(3)=L2%(3)+1

The constant C is needed obviously to improve accuracy (log(160) simply
isn't a very large number).  Note that I divided the arguments of the
logarithms in half; instead of calculating 2*dx/dy I calculate dx/(dy/2),
which is of course the same thing.  This was done to make C work out.
By 'fixing' the tables in this manner, exactly 3927 calculations will
undershoot, which works out to about 6% of all possible calculations we
may perform.

The actual division routine works out pretty slick in assembly:

   DIVXY    MAC      ;Macro to compute 2*X/Y
            LDA LOG1,X  ;This is the division part
            SEC
            SBC LOG2,Y
            BCS CONT
            LDX #00     ;dx/dy < 1
            LDA ]1      ;LDA dx, since dx is exactly the remainder
            BCC L2
   CONT
            TAX
            LDA EXP,X
            TAX      ;X is now integer estimate
            STA MULT1
            EOR #$FF
            ADC #00     ;Carry is guaranteed set
            STA MULT2
            LDA ]1      ;ldxlo or rdxlo (i.e. low byte of dx)
            ADC (MULT2),Y
            SEC
            SBC (MULT1),Y  ;Calculate remainder
   L2       CMP ]2      ;ldy or rdy (i.e. ]2 = dy)
            BCC DONE
   L1       INX      ;Remainder is too large, so up int estimate
            SBC ]2      ;and subtract dy
            CMP ]2      ;Repeat until remainder<dy
            BCS L1
   DONE     <<<      ;Now X contains integer, A remainder
   
Do you see how it works?  First the initial guess N is calculated.  If
log(x) - log(y/2) is negative then dx/dy is less that one, so the
remainder is simply dx and the integer part is zero.  Otherwise,
R= dx - N*dy is calculated.  Since N always undershoots, dx-N*dy will
always be positive, so the high byte of dx isn't needed.  This quantity
R is the remainder, so if it is larger than dy simply increase the
integer estimate and subtract dy from R, and repeat if necessary.

The end result then is a 9-bit/8-bit divide which takes 52 cycles
in the best case.  Pretty neat, huh?  And quite adequate for our
purposes.

Wait just a minute there, bub... what about when dy=0?  Consider what
dy=0 means: it means that two vertices lie along the same line.  That in
turn means that the next vertex can be immediately skipped to.  That is,
simply move on to the next point in the list, be it to the right or to
the left, if dy=0.

Well, ah reckon that that just about completes the polygon fill routine.
To summarize: start at the bottom (top, whatever) of the polygon. 
Calculate the "slopes" of the right and left lines from that point.  
Update the coordinates, fill the in-between parts, and plot the 
end-sections.  Then update Y and keep going.  If another vertex is hit,
then recalculate the corresponding line.                       

Alert people may have noticed that this algorithm translates very nicely
to the 128's VDC chip.

I should probably briefly mention the pattern fills.  I use Y as an index
to a pattern table, so it was very natural to use 8x8 character patterns.
With different indexing of course more complicated patterns can be used.
Moreover, it dawned on me that animated patterns were just as easy as 
normal ones, so I tried to think up a few interesting animated patterns
(there are two in Polygonamy, each pattern is eight frames).

So that's the graphics part, more or less.

We ain't even CLOSE to being done yet.

@(A): 3D Code
      -------
      
Now it's FINALLY time to start writing the master program to control the
3D world.  Luckily we have the BASIC program from waaaay up above to
work from.

First is to decide how angles will be measured.  The smart thing to do is
to let the angle variable vary between 0..127 or 0..255; that is, to
measure angles in units of 2*pi/128 (or 2*pi/256).  The reason this is
smart is because the angle is now periodic, wrapping around 256.  Angles
can be added together without checking for overflow, etc. (257=1, 258=2,
259=3, etc.).  Note that in previous programs I did a very dumb thing
and let the angle variable vary from 0..119, so angles were measured in
three degree increments, and I had to place all sorts of checks into the
code.  Polygonamy uses angle increments of delta=2*pi/128.

Next there is the issue of cx=cx+20 instead of cx=cx+1.  The problem is
that if cx=cx+1 is used it takes forever to move around in the world.
Moreover, the objects get really small at around cx=5000.  What this
means is that in the assembly version we can use a single byte for cx,
and just treat each unit of cx as 20 "real world" units.  That is, in
the assembly program, we will keep track of cx/20 instead of cx.

Sort-of.

Consider the rotation which takes place when we turn left or right: the
world is rotated through an angle delta=2*pi/128, so the calculation is:

   blah= cdel*cx + sdel*cz.
   cz= -sdel*cx + cdel*cz

The problem is that sin(2*pi/128)=0.049 and cos(delta)=0.9988, which
means that, in practice, cdel*cx=cx.  Equally bad is that sdel*cz is very
small when cz starts to get small (e.g. 10*sdel = 0.49).  The result of
this is that objects close to the origin (e.g. us) will not be rotated at
all!

Thus the centers need to be calculated more accurately.  In particular, a
second byte is needed to store the 'decimal' part of the center.  To be
precise, this second byte will contain the decimal part of the center
times 256.  This way we can add and subtract remainders and any over- or
underflows will then affect the integer parts cx,cy,cz.

Very quickly we should decide how to represent remainders of negative
numbers.  A number like -1.5 can be represented as -1 - 0.5, but it can
just as well be represented as -2.0 + 0.5.  By using the second method
remainders are always positive, and that's the smart way to do things (if
nothing else it lets the remainder be a fraction of 256, instead of a
fraction of 128).  It's also the way any computer will round: type
INT(-1.5) and see what happens.

further question arises about how to represent the centers, specifically, 
how do we represent an object which is behind us, e.g. has a negative 
value for cx.  The normal way to represent negative numbers is of course 
to use 2's complement notation, but this has some disadvantages.  One of 
them is multiplication: recall that in an earlier code some really fancy 
footwork needed to be done just to be able to multiply numbers between 
-64..64, and we certainly want the centers to range over more numbers
than that.  This gets worse if we decide to use more bits to represent
the centers, as we must do if a larger world is constructed.

Moreover, Polygonamy is an excuse for testing new and different ideas and
investigating their strengths and limitations, so why not try something
different.  As I look through my notes I'm not really sure what motivated
this choice, but how about the following: let's add 128 to all of our
numbers.(I think this is called excess-128 notation).  -2 will be
represented as 126, -1 will be represented as 127, six will be
represented as 134, etc.

Shifting between the excess numbers and 'real' numbers is as simple as
EOR #128.  Recall that to multiply two numbers, let f(x)=x^2/4, so that
a*b = f(a+b) - f(a-b).  In the new system:

   xo = 128+x
   yo = 128+y

which means:

   xo+yo = 256 + (x+y)
   256+xo-yo = 256 + (x-y).

The 256 added above can be thought of as the carry bit.  What this means
is that all that is needed is to construct a single function,
  
   f(x) = (x-256)^2

where x=-255..255.  We can now very quickly multiply signed numbers in
the range -128..128, and with just a single (albeit 512 byte) table,
using essentially the same multiplication procedure as before.

Now the downside of this method: adding and subtracting excess-128
numbers, and in particular checking for overflow.

   xo+yo = 256 + (x+y)
   if x+y >= 128 then we have overflow
   if x+y < -128 then we have underflow

which implies:

   xo+yo >= 256+128  implies overflow
   xo+yo < 128   implies underflow

with similar results for subtraction.  Note also that after every
addition or subtraction 128 needs to be either added or subtracted from
the result, which either way corresponds to an EOR #$80.  So it's a
little more work to add numbers in this system.  (Of course, adding
normal numbers to excess-128 numbers is no problem, so INC and DEC work
fine).

Back to rotations.  The most obvious thing to do is to create two tables:

   f(x) = (x-128)*cos(delta) + 128
   g(x) = (x-128)*sin(delta) + 128

but remember that the remainders are also needed:

   fr(x) = 256*(remainder((x-128)*cos(delta)))
   gr(x) = 256*(remainder((x-128)*sin(delta)))

Since remainders are always positive none of this excess-128 junk is
needed.  Note that we could also let f(x) and g(x) be 2's-complement
tables, then convert from two's-complement into excess-128 after
performing additions etc.  The conversion is, what do you know, EOR #$80.
This is the smarter thing to do, and an even smarter thing to do is to
let the cosine table (f(x) above) to be in excess-128 format, and the
sine table g(x) in 2's complement.  This way the numbers can be added
as normal, and no conversion need take place:       

* Compare to BaSiC subroutine rotl: above

   ROTL
            INC THETA
            LDY #NUMOBJS   ;Y indexes the object
            DEY
   
   :LOOP    LDX CX,Y ;center coordinate
            LDA CDEL,X  ;CDEL = f(x) above
            LDX CZ,Y
            CLC
            ADC SDEL,X
            STA TEMP ;t1 = ci+si
            LDA CXREM,Y
            CLC
            ADC SDELREM,Y  ;Add remainders
            BCC :CONT1
            INC TEMP
            CLC
   :CONT1   LDX CX,Y
            ADC CDELREM,X
            BCC :CONT2
            INC TEMP
   :CONT2   STA CXREM,Y
   
            LDX CZ,Y
            LDA CDEL,X
            LDX CX,Y
            SEC
            SBC SDEL,X
            STA TEMP2   ;t2=cz-si
            LDA CZREM,Y
            SEC
            SBC SDELREM,X
            BCS :CONT3
            DEC TEMP2
   :CONT3   LDX CZ,Y
            CLC
            ADC CDELREM,X
            BCC :CONT4
            INC TEMP2
   :CONT4   STA CZREM,Y
            LDA TEMP2
            STA CZ,Y
            LDA TEMP
            STA CX,Y
            DEY
            BPL :LOOP
   
Well, that takes care of two lines of BASIC code :).  As it turns out,
using a single byte for the remainder does a pretty good job of holding
the number.  Rotating by 360 degrees one way, then rotating back again,
produces a center which is within a few decimal places of the starting
value.

Next up: projections.  The projection calcuation is:

   Proj(P) = d*(P+C)/(px+cx)

where P=(px,py,pz) and C=(cx,cy,cz).  In terms of the implementation, we
want to calculate:

   d*((P-128) + s*(C-128)) / ((px-128) + s*(cx-128))

where s=20, to translate C into the 'real world'.  To calculate this,
consider the following function:

   g(x) = r*d / (s*((px-128)/s + cx-128)) + 128

where r is some scaling factor.  The projection calculation then becomes:

   1/r*( (g-128)*(P-128) + s*(g-128)*(C-128) )

Thus we need some more tables, one of 1/(4r) * (256-x)^2, the other of 
s/(4r) * (256-x)^2, to do the multiplication.  Furthermore a table of 
(x-128)/s would be pretty handy, and finally we need a table of 
g(x) = r*d/(s*(x-128)) + 128.                                     

The general outline of a program would be:

   Get keypress
   1- If translate, then update all cx's (just some INCs and DECs)
   2- If rotate left or right, then rotate world
   3- Update angles for global & local rotation matrix (e.g. theta)
   4- Figure out which objects to display/construct a list
   5- Call each object in turn
   6- Update bitmap: clear out remaining garbage and swap bitmaps

Numbers 1 and 2 are done.  In number three, by global matrix I mean the
object rotation that results from us turning left or right.  By local
rotation I mean rotations independent of whether or not we turn.  The
local rotation allows e.g. the octahedron to spin around in Polygonamy.

Figuring out which objects to display is easy: just check to make sure it
lies within the viewing cone/pyramid, that we are not too close, etc.  If
an object is to be displayed, it needs to be placed in a list.  I
constructed the list to make sure that objects which are farther away are
drawn first; that way objects can overlap one another correctly.  This
was done via a simple insertion sort -- i.e. bump up objects in the list
until the right spot is reached to insert the object.

We have most of the tools to deal with #5.  Handling an object consists
of rotating and projecting it, then displaying it.  Rotation is the same
as it has always been, albeit now involving sixteen bits, and projection
is described above.  Then each polygon needs to be drawn, by sticking the
points of the polygon into the polygon list, setting up the fill pattern
and the pointer to the minimum Y-value, and calling the polygon fill
routine.  Of course, if the face is hidden then we certainly don't want
to plot it.

The minimum y-value can be found very easily while inserting the points
into the point list -- just keep track of ymax and compare to each point
as it is inserted.

We have discussed several methods of calculating hidden faces --
cross-product, rotated normal, parallel faces -- each of which involves
looking at a vector normal to the face, and either projecting it or
taking the dot product with a vector going to the origin.  What a big
pain in the butt, especially since values can be sixteen-bits, etc.

Did you ever stop to wonder about what happens to all the previous
polygon-fill calculations if the point-list is entered in reverse order?
Quite simply, left -> right and right -> left.  And what happens when a
face is invisible?  The polygon is turned away from our eye.  The points
in the polygon, which go counter-clockwise around the polygon, will go
clockwise when the polygon is turned around.  (I should point out that at
least in my code the points on the polygon are actually done in
clockwise-order, since projection reverses the points).

So, we have hidden-faces already built-in to the polygon plot routine!
In essence, we simply don't plot any polygon which the routine will freak
out on.  We can of course be systematic about this; within the plot
routine:

   - Calculate the left and right lines.
   - Take a trial step along the left and right lines
   - If xleft < xright then we are OK, otherwise punt.

In principle we only need to do this on the first calculation, and use
some properties of the lines to make things easier (for instance, if the
left line is moving left and the right line is moving right, and they
emanate from the same point, we know the polygon is visible).
Unfortunately, nasty situations can arise, for instance when the left and
right slopes have the same integer parts.  So a check needs to be placed
within the fill code to make sure the left point doesn't get ahead of the
right point.  This is unfortunate, as every cycle counts in the fill
code, but luckily there is (was) a natural place to put in a quick check.

All that is left then is #6: run through the fill table, and clear any
lines that still have old junk in them.  Since I used two bitmaps as a
double-buffer, all that is left is to swap the bitmaps, and do it all
again.

Et voila.

@(A): Analysis and Conclusions
      ------------------------

As you can see, the program is not without its flaws.  The biggest one,
I think, deals with the projection.  Recall that I calculate px/s, where
s=20, and add it to cx.  My feeling was that px was going to be very
small compared with cx, and so not modify the projection by much.  But
either this is a bad assumption, or the rotations are all screwed up,
because certain rotations look a bit goofy.  For instance, when you walk
up close to the octahedron it starts to get jumpy, or wobbly.  I note
further that when you are far away from an object it looks much better,
so that might be a way to fudge around the problem (e.g. make the value
of d in the projection much larger).

Speaking of rotations, the 'funky shake' which used to plague the old
programs has now been fixed.  For instance, a rotation in the y-direction
would work well but at some point it would appear to start rotating
backwards, then start going the right way again.  The problem was due to
an overflow in the calculation of the rotation matrix, in a term that
looked like(sin(t1) + sin(t2) - sin(t3) - sin(t4))/4, and the solution is
to split such terms into two, e.g. (sin(t1)+sin(t2))/4 - 
(sin(t3)+sin(t4))/4.

Speaking further of rotations, I find the current system of rotation and
projection unsatisfying, in particular too slow.  Notice how much the
program slows down when all three objects are visible; some 40 points
are being rotated, both locally and globally, at this point.  It is
possible to reduce the number of matrix multiplications from 9 to 8 (and
even lower, with lots of extra overhead), but I find this unsatisfying.
A better method is needed...

There is another bug somewhere in the global rotations which sometimes
causes the objects to wander around -- occasionally I can get the ship or
the octahedron to move close to the pyramid.  Also, when you are really
close to an object and turn, you might notice the curious effect of the
object rotating by small amounts, and then jumping position by a large
amount.  This is due to the 'units of 20' that are used in the program;
the remainder part of (cx,cy,cz) needs to be used here, and then the
display will be smooth as well.

Of course, if multicolor mode was used many of the calculations would be
much simpler, since the screen x-coordinate would only require one byte
instead of two.

The program should be made more efficient memory-wise of course.  Shoving
the fill routines for each buffer together would help out, and a system
for rotating points out of a list, similar to that used in the last 3D
program, would greatly streamline things (although it would be a tad
slower).

There is still a minor bug or two in the fill routine, which causes
little blue chunks to be taken off the ends of some polygons, but I
didn't feel like tracking it down.

Note that although Polygonamy only lets you run around in a plane,
running around in a full three dimensions is quite simple to add.  And,
although there are only three objects in the world, it is all set up to
deal with a lot more.  In summary, I see no major problems standing 
in the way of doing reasonably fast 3D graphics on the 64. 

The object file for this article is available in "Hacking the Code"
(Reference: code, SubRef: polycode) found elsewhere in this issue.

============================================================================