Polygonamy: A Study in 3 Dimensions

      by Stephen L. Judd (sjudd@nwu.edu)

We've been making some pretty small potatoes for a while now, so the time has come for a little more ambition and challenge. I decided to think up a real challenge, containing problems that I had no idea how to solve, and see what I could come up with.

I set out to create a 3D virtual world for the C64, e.g. a space populated with various three-dimensional objects, which I could wander around in. I wanted it to be full-screen 320x200 hires bitmapped. Furthermore, I wanted the objects to be solid, and since there are only two colors I wanted to be able to put patterns on the faces. I also wanted it to translate nicely to the 128's VDC chip, in 2MHz mode. Finally, naturally, I wanted the program to be fast. This was the framework in which I placed myself, and a few other ideas presented themselves along the way. The outcome of all of this is Polygonamy.

Just a brief history of this project: I have wanted to do a 3D world for a very long time, and have been thinking about it for some time in the back of my head; my imagination was probably first fired the first time I played Elite. I wrote down the necessary equations one afternoon last summer, for a high school student I was teaching, and the equations are very simple. I took a break to get some work of measureable value accomplished, but in October I began work on the graphics algorithm. I worked steadily on this for two months, and in December I finally began to code the graphics. In mid-January, I got them to work. Adding the rest took a few weekends more. I have about 128 pages of notes, analytical calculations, and BASIC test programs in my C64 notebook (which is, I think, a nice number of pages to have :). My original plans were to place five objects in the world, but time and memory constraints whittled that down to three. One of my disks self-destructed about the day I was ready to finish, so I had to reconstruct a bunch of tables, but other than that I finally managed to finish it up, albeit with a few rough edges ;).

Although the concepts from previous articles are used as a solid foundation, the code is almost 100% from scratch -- I only borrowed a tiny piece of code from an earlier program, and even that I modified somewhat.

One caveat before we begin: I am primarily interested in the challenge of designing the algorithms, which means I like to come up with my own solutions. Thus, you may find more efficient methods in a graphics book or perhaps in someone else's code; I have examined neither, so I have no idea what the relative merit of my algorithms may be. These are simply my solutions to challenges placed before me. And if you know of a better way to do things, please feel free to email me!

Furthermore, I consider the code a test of the theory. Some of my assumptions work and some do not, and these will be considered at the end of this article.

Finally, I am not including the source code. For one thing, it is big. Like, _HUGE_, man. I had to split it up when I ran out of editor memory on my 128 (which, incidentally, forced me to figure out Merlin 128's very cool and very powerful linker feature). I will include numerous code fragments in assembly and BASIC which demonstrate all the important concepts.

By the way, if you are interested in measuring frame rates, you can use the first object. Every full 360 degree revolution is 128 frames. So time how long it takes to complete a full rev (or maybe several), and divide that number into 128, to get an idea of frames per second. For a rundown of frame rates for stock and SuperCPU operation, see "Underneath the Hood of the SuperCPU" (Reference: cmdcpu) found elsewhere in thi issue.

Some brief acknowledgements: This project would not have happened without the extremely powerful macro and linking capabilities of the Merlin 128 assembler, by Glen Bredon. It would have been _really_ tough without JiffyDOS and my FD-2000, from CMD. I used my Action Replay extensively for debugging, and without the excellent documentation for the 64, such as the PRG and Mapping the 64, this would have been a nightmare. Finally, I must acknowledge my friend George Taylor; a few days before I was all finished I explained some routines to him, and he made a great suggestion which made my fast fill routine blaze.

Okay, WAY too much talk. There are a ton of issues involved with this project so let's just wade in hip-deep and deal with them as they come.

The Equations

First some relaxing abstraction. In previous articles we have discussed how to project an object in three dimensions through the origin into a plane. We have also discussed rotations in three dimensions. In principle, then, we have all the mathematics we need to do a 3D world.

But we should be thoughtful. Let's say we're standing in the world and turn to the right; we can either rotate ourselves, and change the direction we are looking, or we can rotate the world around us, so that we are always looking 'forward'. This may bother you on physical grounds, but the two are mathematically equivalent. Given the way we have derived our projection routines, it should be clear that we want to rotate the objects in the world around us.

(Or, to put it another way, we are at the center of the world, and the world revolves around us.)

We have another issue: how do we know when an object is visible or not? How do we know when we've bumped into an object (or blown it out of the sky :)? Moreover, if we have ten objects, and each object has six points, it would be a real drag to have to rotate all sixty points, especially if none of the objects were even visible.

It should be clear that we really want to define every object relative to some center of the object. So we keep track of the center of each object, and rotate and translate the centers, and only calculate the full object if it is visible. We of course want to define the object relative to this center.

What happens to this center when we translate or rotate the world?

Let's simplify our model a little bit and only deal with rotations about one axis, e.g. we are driving a tank and can move forwards and backwards, and turn left or right. The generalization to other axes is very straightforward, but this way we can think in two dimensions instead of three.

First we need to agree on a coordinate system. For my system I let the x-axis go _up_ a page of paper, the y-axis comes up out of the page, and the z-axis goes from left to right. Thus, I am standing on the paper in the x-z plane, at the origin, with the y-axis extending upwards from me. If you still don't understand my orientation, draw a little picture.

I am going to choose my orientation so that I am always looking straight down the x-axis, e.g. in the direction that x is increasing. Thus, if I walk forwards or backwards, this corresponds to decreasing or increasing the x-component of the center coordinate:

   let C=(cx,cy,cz)
   move forwards => cx=cx-1
   move backwards=> cx=cx+1

So far so good. As always, draw a picture if you can't visualize it. That takes care of translations, what about rotations?

We certainly know how to rotate points about the origin. In particular, if we have a point with coordinates (x1,z1) and rotate it clockwise by an angle s, we get the new point as follows:

   (x1,z1) -> (x1*cos(s)+z1*sin(s), z1*cos(s)-x1*sin(s))

So that's easy enough. The problem is that we have this big object sitting around this point, and need to figure out what to do with it!

Consider the following: let's say we have a line out some distance from the origin,

                         X
                         X
            |            X

z-axis ----O------------c---- c=center

            |            X
            |            X
          Origin         X <---- line

and we rotate it by some amount theta about the origin:

                      X
                      XX
                       Xc      c=rotated center
            |           XX

z-axis ---O------------XX-

| X XX

You can see (from my incredible ASCII artwork) that the line is now at an angle with respect to the origin.

Imagine that we draw a line from the origin to the center of the point, in the first picture (or get a pencil and paper and actually do it), so that we have the letter "T" laying on its side. Now we rotate this "T" by some angle theta, so that the top of the "T" -- our line -- has now been rotated.

The stem of the "T" meets the top of the "T" at the center point c. Drop a line from the rotated center straight down to the z-axis, and call this line l2. Since the T is at a right angle, and we have rotated it by an angle theta, the angle between our line and the z-axis is 90-theta. But this means that the angle between our line (the top of the "T") and the line l2 is just theta.

Thus, if we rotate the center by an amount theta, all we have to do is rotate the object by an amount theta *about the center*, and our perspectives will all be correct. How is that for simple? It should be clear now that this works no matter where the "center" of the object is chosen. Thus, our center is not some physical center of the object, but rather the center of rotation of the object.

Since this is true of rotations about any axis, we now know how to generalize to higher dimensions.

Note further that we can now implement local rotations -- that is, let's say the object is a tank, and this tank turns independently of whether or not we turn. Piece of cake.

You can also see that the rotations are cumulative. If we turn to the left, and then turn left again, we can simply apply two rotations to the points of the object. In fact, if we turn left, move forwards, and then move left again, we still apply just two rotations to the points of the object; the center, however, has moved.

This is quite important, as it allows us to measure an object's relative rotation to us, whether or not it is visible. Remember that we only want to rotate the points that define an object when the object is visible. We never actually change the points which define an object. Instead, we track how much the object needs to rotate, and rotate the original points by that amount.

The center of the object will change with each rotation and translation. We never change how we define an object about this center, though. We simply apply a rotation to the original points when appropriate. The object centers must be kept track of because they can undergo translation as well as rotation.

To summarize then: we define each object relative to a center of rotation. The center determines where the object is located in the world, and allows us to operate on centers when we need to rotate or translate the world. It also lets us perform local operations on an object, so that we could, for instance, have a spinning cube located inside our world. If an object's center is visible then we can consider the object to be visible, and plot it.

Whoops -- what does it mean to be 'visible'? Well, think about yourself. You can see things when they are in front of you. Or, to be more precise, you have a field of vision. Perhaps a decent model of this would be a cone. I think a better model, certainly one which is easier to deal with computationally, is the intersection of two planes: a pyramid. Anything which lies within this pyramid we consider visible, and anything which lies outside we consider not visible.

Two-dimensionally, we draw a little wedge extending from the origin. Anything within the wedge we count as visible, and anything outside of it we count as not visible. The sides of this wedge are two lines, with equal but opposite slope (i.e. slope=+/-m).

\ Visible /

\ View / Outside of visual area \Area /

* <--- Me

Probably lines at some angle are more reasonable than others. But I'm a simple guy, and the two simplest lines I can draw are at 45 degree angles from the axis, so their slope is +/-1. Thus, any points which lie between the lines x+z=0 and x-z=0 are visible. If the center of an object is within this area, we will consider the object visible. That is, if cx+cz>0 and cx-cz>0 the object is visible.

One last thing: if we are too close to the object, we either want to bump into it (i.e. not move) or else not display it. So we also need to check if cx

We are now in a position to write some simple code. I wrote the following in evil Microsoft QBasic, but BASIC7.0 on the 128 would work just as well, although you need to change the variables (I didn't have my 128 handy, otherwise I would have written this on the 128):

Polygon Prototype Code

   SCREEN 1 '320x200

   delta= 3    'Rotations will be in 3 degree increments
   rad= 3.1415926/180
   cdel= COS(rad*delta)
   sdel= SIN(rad*delta)
   theta=0
   d=-135
   x0= 60      'Bumped into the object?
   REM z0 y0 would be 160,100 to place the object in the center
   REM of the screen
   y0= 170  'I want the bottom of screen to be ground
   z0= 160

   REM Set up the object
   REM Tetrahedron: 0,sqrt(3),0  0,0,1  0,0,-1  2,0,0
   DIM obx(4), oby(4), obz(4)

   obx(1)= 0
   oby(1)= 50*SQR(3)
   obz(1)= 0
   obx(2)= 0
   oby(2)= 0
   obz(2)= 50
   obx(3)= 0
   oby(3)= 0
   obz(3)= -50
   obx(4)= 100
   oby(4)= 0
   obz(4)= 0

   cx= 100
   cy= -10
   cz= 0

   REM Get input
   main:

   DO
     a$=INKEY$
   LOOP UNTIL a$<>""

   IF a$="[" THEN cx=cx-20
   IF a$="/" THEN cx=cx+20
   IF a$=";" THEN GOSUB rotl
   IF a$="'" THEN GOSUB rotr

   IF cx
   ctheta= COS(rad*theta)
   stheta= SIN(rad*theta)

   p1x= cx + ctheta*obx(1) + stheta*obz(1)   'Rotate and add to center
   p1y= cy + oby(1)
   p1z= cz + ctheta*obz(1) - stheta*obx(1)
   p1y= y0 + d*p1y/p1x          'Project and add offset
   p1z= z0 + d*p1z/p1x
   [... similar for p2x,p2y,p2z,...,p4x,p4y,p4z]

   CLS
   LINE (p1z, p1y)-(p2z, p2y)
   [... lines between p2-p3, p3-p1, p1-p4, p4-p2, p4-p3]

   GOTO main:    'Main loop

   REM rotate left
   rotl:
       theta= theta + delta
       blah= cdel*cx + sdel*cz
       cz= -sdel*cx + cdel*cz
       cx= blah
   RETURN

   rotr:
       theta= theta-delta
       blah= cdel*cx - sdel*cz
       cz= sdel*cx + cdel*cz
       cx= blah
   RETURN

(You may note that cx=cx+20 is used for a translation, instead of cx=cx+1. This will be detailed later).

So much for the easy part.

Filling

If there is one thing that the previous programs have taught us, it is that graphics are slow. At least, they are far and above the major thing slowing down the program, and deserve the most attention and thought for improvement. Moreover, because there is lots of looping involved, the elimination of just a few instructions can translate to thousands of cycles saved.

We have examined several fill routines up to now, but neither of them is up to the task of Polygonamy. The cookie-cutter method is OK, but doesn't allow multiple objects, and certainly doesn't allow pattern fills. Using an EOR-buffer is just plain slow and inefficient and a big drag. So it's time to rethink this problem.

Recall that on the 64 the bitmap screen is divided into 8x8 cells, which are arranged horizontally in rows. It's a pretty kooky way of doing things, but we shall overcome.

First of all it should be clear that we want to fill from left to right (as opposed from top to bottom). We can then fill a byte at a time, instead of dinking in little pixels at a time.

Previously we used a custom character set to plot into. One of the major reasons for doing so was to use Y as the Y-coordinate, so that storing a particular point was as simple as STA COLUMN,Y. We can still use this idea, but only within each row. That is, if we let Y=0..7, we can address each individual pixel-row within each 8x8 block row with an STA ADDRESS,Y.

For real speed, we are going to want an unrolled fill routine. That is, we don't want to mess around with loop counters and updating pointers and such. Since there are 25 rows on the screen (25 times 8 = 200 pixels high) we are probably going to need 25 separate fill routines.

I constructed my fill routine as follows:

STA COL1,Y DEX BEQ :DONE STA COL2,Y DEX BEQ :DONE STA COL3,Y ... etc. :DONE RTS

Thus X would be my counter into the number of columns to fill, A can contain our pattern to fill with, and Y can range from 0..7 to index the individual rows within the block. The first thing to notice is that each STA/DEX/BEQ code chunk is six bytes. So, all we need to do is calculate which row to start filling at, multiply by six, and add that number to the start of the fill routine. The idea is then to jump into the correct place in the fill, and let it fill the right number of columns, stored in X.

There is a little problem though -- what we're talking about doing is an indirect JSR, and there is no such thing. But it's easy enough to fake, because we can use an indirect JMP. So a call to the fill routine would look like the following:

JSR FILL

FILL JMP (ADDRESS)

where ADDRESS simply points to the correct entry point in the fill routine.

Moreover, you may also note that 40 columns times 6 bytes/column is 240 bytes, so that each little fill routine handily fits in a page. Thus, moving between rows in the bitmap corresponds to a simple decrement or increment of the high byte of the ADDRESS pointer.

This was the state of things when, days before I was to be all done with Polygonamy, I mentioned it to my friend George Taylor, who suggested the following modification: instead of using X to count how many columns to fill, just make the fill routine:

STA COL1,Y STA COL2,Y STA COL3,Y

Then, insert an RTS into the right place in the routine. Thus, we calculate which column to stop filling at, multiply by three, and stick an RTS in the fill routine at that point. To fix it up we stick an STA ..,Y back on top of the RTS.

I don't think you're going to make a fill routine faster than that :).

Moreover, note that each fill routine takes up just 120 bytes, so we can now fit two fill routines in each page. I did not do this, but it is easy to do, and instantly frees up 25 pages.

Filled Polygons

I mean, hey, this _is_ "Polygonamy", so let's talk polygons, and lots of them.

Clearly all that is needed to draw an object are the left and right endpoints of the object, since everything in-between will be filled.

An observation to make is that if you take a slice out of a convex polygon, the slice will intersect the polygon at exactly two points. Another, more important, observation is to note that the highest and lowest point of a polygon will always be at a vertex. Finally, it is important to note that any vertex of a polygon has exactly two lines extending out of it, one to the left, and one to the right.

Consider a piece of a polygon:

       *  <--- Vertex v0

where the vertex v0 is the lowest point of the polygon. All that needs to be done is to move upwards (DEY), compute the left and right points of the polygon at that point, and then fill between the two (JSR FILL).

The idea then is to start at the bottom (or the top) and to steadily move upwards while _simultaneously_ calculating the endpoints of the left and right lines, and filling in-between. But we need the equations of the left and right lines to do this.

Now it's time for another observation. Let's say we have a polygon with n vertices v1, v2, ..., vn, and furthermore that as we move between these points we move around the polygon counter-clockwise. Thus v3 is to the right of v2, v1 is to the left of v2, v4 is to the right of v3, etc. For example:

v1____v3

What happens if we rotate this polygon?

v2____v1

The vertices have changed position, but *their order has not*. v3 is still to the right of v2, and v1 is still to the left.

Now we have a real plan. We simply define the polygon as a list of points v1 v2 v3 ... vn. We then figure out which one is lowest, i.e. has the smallest (or greatest) y-coordinate, call this vertex vm (vmax). The endpoints of the left and right lines are vm-1,vm and vm,vm+1. So move along those lines until the next vertex is reached. At that point, recompute the appropriate line, and keep moving upwards until the top of the polygon is reached.

Perhaps an example would be helpful:

| \ v3

v2 is the minimum. The left line has endpoints (v1,v2) and the right line has endpoints (v2,v3). We steadily move along the left and right lines as we creep upwards. At some point we hit v3, and at this point we compute a new equation for the right line, this time with endpoints (v3,v1). Now we continue to creep upwards and move along the left and right lines, until we hit v1, at which point we are finished.

It is important to keep in mind that the order of the points never changes. We don't need to do anything complicated like sorting the points; we only need to find the lowest point, and branch left and right from there, keeping in mind that the points are cyclic (i.e. v1 is to the right of vn).

It is now time to start thinking about code. One aspect of the fill routine we haven't considered is the clear. In the past the entire draw buffer was cleared and then the new stuff was drawn into it. But this seems like a bit of a waste; it seems wasteful to clear a bunch of memory that is just going to be overwritten again. So, as long as we can do it efficiently, it might be smart to combine the clear and fill routines.

Here is how Polygonamy does it: If a line needs to be cleared, then it is cleared up to the edges of the object, but the part that is going to be filled is ignored. (It isn't clear if this provides any substantial efficiency gains, though).

To see the status of a particular line, a table is used, containing a value for each Y-coordinate. If the entry is 255 then the line is clear, if it's 0 then the line has old junk in it, and if it's 1 then the line has new junk in it. Thus we only clear the line if its entry in the fill table is a zero.

So a fill routine might flow like the following:

   let Y count from 7..0

   If we are at the left endpoint then recalculate the left line.
   If we are at the right endpoint the recalculate the right line.
   Update xleft & xright
   If line needs to be cleared then clear line.
   If the starting fill column is different than the previous fill
       column then update the pointers, etc.
   Plot the left and right endpoints (since the fill routine only plots
       eight bits at a time)
   Fill the in-between parts
   Update Y
   If Y passes through zero then update fill entry point, set Y=7 etc.
   Keep going until we reach the top

The next thing to figure out is how to calculate the left and right lines. We do have the old line routine, which we could use to advance to the left and right endpoints, but clearly this isn't too efficient.

The question is: if the y-coordinate increases by one, then how much does the x-coordinate increase by? The equation of a line is:

   (y-y0) = m*(x-x0)  m=slope

   change in y = m*change in x

So, if the change in y is 1, then then the change in x is 1/m. All we need to do then is calculate the inverse-slope=dx/dy, where dx=x2-x1 and dy=y2-y1, and add this to the x-coordinate with each step in y.

Isn't this a fraction? Sure, big deal. The fraction can be written as dx/dy = N + Rem/dy, where N is an integer, and Rem is the remainder, which is always less than dy. So to calculate x=x+dx/dy:

   x= x+N
   xrem= xrem+Rem
   If xrem>=dy then x=x+1:xrem=xrem-dy

As usual, we want to start xrem at dy/2, which has the effect of rounding numbers up.

   10 REM LINE ROUTINE TAKE TWO SLJ 11/24/95
   12 REM ACTUALLY IT'S A FILL RTY NOW
   15 GRAPHIC 1,1
   20 X0=160:Y0=100
   30 X1=5:Y1=-50:X2=7:Y2=11:XL=X1:YL=Y1:Y=Y1
   35 X3=50:Y3=Y1:X4=X3+100:Y4=Y2:XR=X3:YR=Y3
   40 D1=Y2-Y1+1:DX=X2-X1:LI=INT(DX/D1):LR=DX-LI*D1
   45 TL=INT(D1/2)
   46 D2=Y4-Y3+1:DX=X4-X3:RI=INT(DX/D2):RR=DX-RI*D2
   48 TR=INT(D2/2)
   50 DRAW1, X0+X1,Y0-Y1 TO X0+X2,Y0-Y2, X0+X3,Y0-Y3 TO X0+X4,Y0-Y4
   60 REM MAIN LOOP
   70 XR=XR+RI:TR=TR+RR:IF TR>=D2 THEN TR=TR-D2:XR=XR+1
   75 DRAW1, X0+XL,Y0-Y TO X0+XR,Y0-Y
   80 XL=XL+LI:TL=TL+LR:IF TL>=D1 THEN TL=TL-D1:XL=XL+1
   90 REM DRAW1, X0+XL,Y0-Y TO X0+XR,T0-T
   100 Y=Y+1:IF Y<=Y2 THEN 60

In this program (x1,y1)-(x2,y2) is the left line, and (x3,y3)-(x4,y4) is the right line. The first thing to note is that in lines 40 and 46, Y=y2-y1+1. This issue was discussed in the very first C=Hacking 3D-graphics article. The problem is that although the line will be anatomically correct with DY=y2-y1, it will look silly. The easiest way to see this is to consider y2-y1=1, e.g. say we draw a line between (0,10) to (50,11). Ideally this line will consist of two line segments, one from (0,10) to (25,10) and the other from (26,11) to (50,11). But ifi we use DY=1 we will have one line segment from (0,10) to (50,10), and a single point at (50,11).

Adding one to DY is just a simple cheat. Most of the time the lines will look just fine, but lines which have a slope near one will come out a bit wrong. The other, accurate solution, which was used in the first article, is more complicated to implement in this routine. Adding one to DY will also have a useful benefit which we shall shortly see.

In line 50 above the boundaries of our object are drawn in, to check the accuracy of the algorithm.

In lines 70-80 the right point is updated, then the thing is filled, then the left point is updated. This is because both lines are moving to the right, e.g. they both have positive slope. Think about how the line segments will be drawn; in general, we want to draw from the left end of the left line segment to the right end of the right line segment. (Sometimes this will look a little off where the two lines meet).

Since the left and right lines can each have either positive or negative slopes, there are four possibilities: Plus-Plus, Plus-Minus, Minus-Minus, and Minus-Plus.

Plus-Plus: Update right, fill, then update left Plus-Minus: Fill, update left, update right Minus-Minus:Update left, fill, update right Minus-Plus: Update left, update right, fill

If this is still confusing, try out the above program with various left and right line segments, and these things will jump right out.

Now we need to think about implementing this in assembly. Since this is being done in hires 320x200, the x-coordinate requires two bytes, and the y-coordinate requires one. We also need another byte to store the remainder portion of the x-coordinate.

The most glaring question is the calculation of dx/dy: somehow we need a fast way of exactly dividing an eight bit number dy into a nine bit number dx. Recall that we always add one to dy, so that dy actually ranges from 2 to 200. Since the maximum value of dx is 320 or so, the largest value of dx/dy that we can have is 320/2 = 160. In other words, both the integer and remainder part of dx/dy will fit in a byte. Simply adding one to dy makes life pretty easy at this end.

One very fast method of doing division is of course by using logarithms. But they have a problem with accuracy. One the other hand, one thing we know how to do very quickly is multiplication.

This then is the plan: use logarithmic division to get an estimate for N, the integer part. Then calculate N*dy, compare with dx, and adjust the integer part accordingly.

A quick reminder of how logarithms can be used for division:

   log(a/b) = log(a) - log(b)
   exp(log(x)) = x

thus

   a/b = exp(log(a)-log(b)).

How do we take the log of a 9-bit number? We don't. Instead, we construct a table of f(x)=log(2*x), and use, not x, but x/2, as a parameter. Remember that the logarithms merely give an estimate to the integer part.

Moreover, if the tables are constructed carefully we can insure that the estimate for N is either exact or too small. Thus we only need to check for undershoots, which simplifies the calculation considerably. In particular, the tables were constructed as follows:

   10 DIM L1%(160), L2%(200), EX%(255): C=255/LOG(160)
   20 FOR I=1 TO 160
   30 L1%(I)=INT(C*LOG(I))
   40 NEXT
   50 FOR I=2 TO 200
   60 L2%(I)=INT(C*LOG(I/2)+0.5)
   70 NEXT
   80 FOR I=0 TO 255
   90 EX%(I)=INT(EXP(I/C))
   95 IF(I=129)OR(I=148)OR(I=153)OR(I=81)OR(I=98)THEN EX%(I)= EX%(I)-1
   100 NEXT
   110 L2%(3)=L2%(3)+1

The constant C is needed obviously to improve accuracy (log(160) simply isn't a very large number). Note that I divided the arguments of the logarithms in half; instead of calculating 2*dx/dy I calculate dx/(dy/2), which is of course the same thing. This was done to make C work out. By 'fixing' the tables in this manner, exactly 3927 calculations will undershoot, which works out to about 6% of all possible calculations we may perform.

The actual division routine works out pretty slick in assembly:

   DIVXY    MAC      ;Macro to compute 2*X/Y
            LDA LOG1,X  ;This is the division part
            SEC
            SBC LOG2,Y
            BCS CONT
            LDX #00     ;dx/dy < 1
            LDA ]1      ;LDA dx, since dx is exactly the remainder
            BCC L2
   CONT
            TAX
            LDA EXP,X
            TAX      ;X is now integer estimate
            STA MULT1
            EOR #$FF
            ADC #00     ;Carry is guaranteed set
            STA MULT2
            LDA ]1      ;ldxlo or rdxlo (i.e. low byte of dx)
            ADC (MULT2),Y
            SEC
            SBC (MULT1),Y  ;Calculate remainder
   L2       CMP ]2      ;ldy or rdy (i.e. ]2 = dy)
            BCC DONE
   L1       INX      ;Remainder is too large, so up int estimate
            SBC ]2      ;and subtract dy
            CMP ]2      ;Repeat until remainder

Do you see how it works? First the initial guess N is calculated. If log(x) - log(y/2) is negative then dx/dy is less that one, so the remainder is simply dx and the integer part is zero. Otherwise, R= dx - N*dy is calculated. Since N always undershoots, dx-N*dy will always be positive, so the high byte of dx isn't needed. This quantity R is the remainder, so if it is larger than dy simply increase the integer estimate and subtract dy from R, and repeat if necessary.

The end result then is a 9-bit/8-bit divide which takes 52 cycles in the best case. Pretty neat, huh? And quite adequate for our purposes.

Wait just a minute there, bub... what about when dy=0? Consider what dy=0 means: it means that two vertices lie along the same line. That in turn means that the next vertex can be immediately skipped to. That is, simply move on to the next point in the list, be it to the right or to the left, if dy=0.

Well, ah reckon that that just about completes the polygon fill routine. To summarize: start at the bottom (top, whatever) of the polygon. Calculate the "slopes" of the right and left lines from that point. Update the coordinates, fill the in-between parts, and plot the end-sections. Then update Y and keep going. If another vertex is hit, then recalculate the corresponding line.

Alert people may have noticed that this algorithm translates very nicely to the 128's VDC chip.

I should probably briefly mention the pattern fills. I use Y as an index to a pattern table, so it was very natural to use 8x8 character patterns. With different indexing of course more complicated patterns can be used. Moreover, it dawned on me that animated patterns were just as easy as normal ones, so I tried to think up a few interesting animated patterns (there are two in Polygonamy, each pattern is eight frames).

So that's the graphics part, more or less.

We ain't even CLOSE to being done yet.

3D Code

Now it's FINALLY time to start writing the master program to control the 3D world. Luckily we have the BASIC program from waaaay up above to work from.

First is to decide how angles will be measured. The smart thing to do is to let the angle variable vary between 0..127 or 0..255; that is, to measure angles in units of 2*pi/128 (or 2*pi/256). The reason this is smart is because the angle is now periodic, wrapping around 256. Angles can be added together without checking for overflow, etc. (257=1, 258=2, 259=3, etc.). Note that in previous programs I did a very dumb thing and let the angle variable vary from 0..119, so angles were measured in three degree increments, and I had to place all sorts of checks into the code. Polygonamy uses angle increments of delta=2*pi/128.

Next there is the issue of cx=cx+20 instead of cx=cx+1. The problem is that if cx=cx+1 is used it takes forever to move around in the world. Moreover, the objects get really small at around cx=5000. What this means is that in the assembly version we can use a single byte for cx, and just treat each unit of cx as 20 "real world" units. That is, in the assembly program, we will keep track of cx/20 instead of cx.

Sort-of.

Consider the rotation which takes place when we turn left or right: the world is rotated through an angle delta=2*pi/128, so the calculation is:

   blah= cdel*cx + sdel*cz.
   cz= -sdel*cx + cdel*cz

The problem is that sin(2*pi/128)=0.049 and cos(delta)=0.9988, which means that, in practice, cdel*cx=cx. Equally bad is that sdel*cz is very small when cz starts to get small (e.g. 10*sdel = 0.49). The result of this is that objects close to the origin (e.g. us) will not be rotated at all!

Thus the centers need to be calculated more accurately. In particular, a second byte is needed to store the 'decimal' part of the center. To be precise, this second byte will contain the decimal part of the center times 256. This way we can add and subtract remainders and any over- or underflows will then affect the integer parts cx,cy,cz.

Very quickly we should decide how to represent remainders of negative numbers. A number like -1.5 can be represented as -1 - 0.5, but it can just as well be represented as -2.0 + 0.5. By using the second method remainders are always positive, and that's the smart way to do things (if nothing else it lets the remainder be a fraction of 256, instead of a fraction of 128). It's also the way any computer will round: type INT(-1.5) and see what happens.

further question arises about how to represent the centers, specifically, how do we represent an object which is behind us, e.g. has a negative value for cx. The normal way to represent negative numbers is of course to use 2's complement notation, but this has some disadvantages. One of them is multiplication: recall that in an earlier code some really fancy footwork needed to be done just to be able to multiply numbers between -64..64, and we certainly want the centers to range over more numbers than that. This gets worse if we decide to use more bits to represent the centers, as we must do if a larger world is constructed.

Moreover, Polygonamy is an excuse for testing new and different ideas and investigating their strengths and limitations, so why not try something different. As I look through my notes I'm not really sure what motivated this choice, but how about the following: let's add 128 to all of our numbers.(I think this is called excess-128 notation). -2 will be represented as 126, -1 will be represented as 127, six will be represented as 134, etc.

Shifting between the excess numbers and 'real' numbers is as simple as EOR #128. Recall that to multiply two numbers, let f(x)=x^2/4, so that a*b = f(a+b) - f(a-b). In the new system:

   xo = 128+x
   yo = 128+y

which means:

   xo+yo = 256 + (x+y)
   256+xo-yo = 256 + (x-y).

The 256 added above can be thought of as the carry bit. What this means is that all that is needed is to construct a single function,

   f(x) = (x-256)^2

where x=-255..255. We can now very quickly multiply signed numbers in the range -128..128, and with just a single (albeit 512 byte) table, using essentially the same multiplication procedure as before.

Now the downside of this method: adding and subtracting excess-128 numbers, and in particular checking for overflow.

   xo+yo = 256 + (x+y)
   if x+y >= 128 then we have overflow
   if x+y < -128 then we have underflow

which implies:

   xo+yo >= 256+128  implies overflow
   xo+yo < 128   implies underflow

with similar results for subtraction. Note also that after every addition or subtraction 128 needs to be either added or subtracted from the result, which either way corresponds to an EOR #$80. So it's a little more work to add numbers in this system. (Of course, adding normal numbers to excess-128 numbers is no problem, so INC and DEC work fine).

Back to rotations. The most obvious thing to do is to create two tables:

   f(x) = (x-128)*cos(delta) + 128
   g(x) = (x-128)*sin(delta) + 128

but remember that the remainders are also needed:

   fr(x) = 256*(remainder((x-128)*cos(delta)))
   gr(x) = 256*(remainder((x-128)*sin(delta)))

Since remainders are always positive none of this excess-128 junk is needed. Note that we could also let f(x) and g(x) be 2's-complement tables, then convert from two's-complement into excess-128 after performing additions etc. The conversion is, what do you know, EOR #$80. This is the smarter thing to do, and an even smarter thing to do is to let the cosine table (f(x) above) to be in excess-128 format, and the sine table g(x) in 2's complement. This way the numbers can be added as normal, and no conversion need take place:

Compare to BaSiC subroutine rotl: above
ROTL
INC THETA
LDY #NUMOBJS ;Y indexes the object DEY

:LOOP LDX CX,Y ;center coordinate LDA CDEL,X ;CDEL = f(x) above LDX CZ,Y
CLC
ADC SDEL,X
STA TEMP ;t1 = ci+si
LDA CXREM,Y
CLC
ADC SDELREM,Y ;Add remainders BCC :CONT1
INC TEMP
CLC
:CONT1 LDX CX,Y
ADC CDELREM,X
BCC :CONT2
INC TEMP
:CONT2 STA CXREM,Y

LDX CZ,Y
LDA CDEL,X
LDX CX,Y
SEC
SBC SDEL,X
STA TEMP2 ;t2=cz-si
LDA CZREM,Y
SEC
SBC SDELREM,X
BCS :CONT3
DEC TEMP2
:CONT3 LDX CZ,Y
CLC
ADC CDELREM,X
BCC :CONT4
INC TEMP2
:CONT4 STA CZREM,Y
LDA TEMP2
STA CZ,Y
LDA TEMP
STA CX,Y
DEY
BPL :LOOP

Well, that takes care of two lines of BASIC code :). As it turns out, using a single byte for the remainder does a pretty good job of holding the number. Rotating by 360 degrees one way, then rotating back again, produces a center which is within a few decimal places of the starting value.

Next up: projections. The projection calcuation is:

   Proj(P) = d*(P+C)/(px+cx)

where P=(px,py,pz) and C=(cx,cy,cz). In terms of the implementation, we want to calculate:

   d*((P-128) + s*(C-128)) / ((px-128) + s*(cx-128))

where s=20, to translate C into the 'real world'. To calculate this, consider the following function:

   g(x) = r*d / (s*((px-128)/s + cx-128)) + 128

where r is some scaling factor. The projection calculation then becomes:

   1/r*( (g-128)*(P-128) + s*(g-128)*(C-128) )

Thus we need some more tables, one of 1/(4r) * (256-x)^2, the other of s/(4r) * (256-x)^2, to do the multiplication. Furthermore a table of (x-128)/s would be pretty handy, and finally we need a table of g(x) = r*d/(s*(x-128)) + 128.

The general outline of a program would be:

Get keypress 1- If translate, then update all cx's (just some INCs and DECs) 2- If rotate left or right, then rotate world 3- Update angles for global & local rotation matrix (e.g. theta) 4- Figure out which objects to display/construct a list 5- Call each object in turn 6- Update bitmap: clear out remaining garbage and swap bitmaps

Numbers 1 and 2 are done. In number three, by global matrix I mean the object rotation that results from us turning left or right. By local rotation I mean rotations independent of whether or not we turn. The local rotation allows e.g. the octahedron to spin around in Polygonamy.

Figuring out which objects to display is easy: just check to make sure it lies within the viewing cone/pyramid, that we are not too close, etc. If an object is to be displayed, it needs to be placed in a list. I constructed the list to make sure that objects which are farther away are drawn first; that way objects can overlap one another correctly. This was done via a simple insertion sort -- i.e. bump up objects in the list until the right spot is reached to insert the object.

We have most of the tools to deal with #5. Handling an object consists of rotating and projecting it, then displaying it. Rotation is the same as it has always been, albeit now involving sixteen bits, and projection is described above. Then each polygon needs to be drawn, by sticking the points of the polygon into the polygon list, setting up the fill pattern and the pointer to the minimum Y-value, and calling the polygon fill routine. Of course, if the face is hidden then we certainly don't want to plot it.

The minimum y-value can be found very easily while inserting the points into the point list -- just keep track of ymax and compare to each point as it is inserted.

We have discussed several methods of calculating hidden faces -- cross-product, rotated normal, parallel faces -- each of which involves looking at a vector normal to the face, and either projecting it or taking the dot product with a vector going to the origin. What a big pain in the butt, especially since values can be sixteen-bits, etc.

Did you ever stop to wonder about what happens to all the previous polygon-fill calculations if the point-list is entered in reverse order? Quite simply, left -> right and right -> left. And what happens when a face is invisible? The polygon is turned away from our eye. The points in the polygon, which go counter-clockwise around the polygon, will go clockwise when the polygon is turned around. (I should point out that at least in my code the points on the polygon are actually done in clockwise-order, since projection reverses the points).

So, we have hidden-faces already built-in to the polygon plot routine! In essence, we simply don't plot any polygon which the routine will freak out on. We can of course be systematic about this; within the plot routine:

- Calculate the left and right lines. - Take a trial step along the left and right lines - If xleft < xright then we are OK, otherwise punt.

In principle we only need to do this on the first calculation, and use some properties of the lines to make things easier (for instance, if the left line is moving left and the right line is moving right, and they emanate from the same point, we know the polygon is visible). Unfortunately, nasty situations can arise, for instance when the left and right slopes have the same integer parts. So a check needs to be placed within the fill code to make sure the left point doesn't get ahead of the right point. This is unfortunate, as every cycle counts in the fill code, but luckily there is (was) a natural place to put in a quick check.

All that is left then is #6: run through the fill table, and clear any lines that still have old junk in them. Since I used two bitmaps as a double-buffer, all that is left is to swap the bitmaps, and do it all again.

Et voila.

Analysis and Conclusions

As you can see, the program is not without its flaws. The biggest one, I think, deals with the projection. Recall that I calculate px/s, where s=20, and add it to cx. My feeling was that px was going to be very small compared with cx, and so not modify the projection by much. But either this is a bad assumption, or the rotations are all screwed up, because certain rotations look a bit goofy. For instance, when you walk up close to the octahedron it starts to get jumpy, or wobbly. I note further that when you are far away from an object it looks much better, so that might be a way to fudge around the problem (e.g. make the value of d in the projection much larger).

Speaking of rotations, the 'funky shake' which used to plague the old programs has now been fixed. For instance, a rotation in the y-direction would work well but at some point it would appear to start rotating backwards, then start going the right way again. The problem was due to an overflow in the calculation of the rotation matrix, in a term that looked like(sin(t1) + sin(t2) - sin(t3) - sin(t4))/4, and the solution is to split such terms into two, e.g. (sin(t1)+sin(t2))/4 - (sin(t3)+sin(t4))/4.

Speaking further of rotations, I find the current system of rotation and projection unsatisfying, in particular too slow. Notice how much the program slows down when all three objects are visible; some 40 points are being rotated, both locally and globally, at this point. It is possible to reduce the number of matrix multiplications from 9 to 8 (and even lower, with lots of extra overhead), but I find this unsatisfying. A better method is needed...

There is another bug somewhere in the global rotations which sometimes causes the objects to wander around -- occasionally I can get the ship or the octahedron to move close to the pyramid. Also, when you are really close to an object and turn, you might notice the curious effect of the object rotating by small amounts, and then jumping position by a large amount. This is due to the 'units of 20' that are used in the program; the remainder part of (cx,cy,cz) needs to be used here, and then the display will be smooth as well.

Of course, if multicolor mode was used many of the calculations would be much simpler, since the screen x-coordinate would only require one byte instead of two.

The program should be made more efficient memory-wise of course. Shoving the fill routines for each buffer together would help out, and a system for rotating points out of a list, similar to that used in the last 3D program, would greatly streamline things (although it would be a tad slower).

There is still a minor bug or two in the fill routine, which causes little blue chunks to be taken off the ends of some polygons, but I didn't feel like tracking it down.

Note that although Polygonamy only lets you run around in a plane, running around in a full three dimensions is quite simple to add. And, although there are only three objects in the world, it is all set up to deal with a lot more. In summary, I see no major problems standing in the way of doing reasonably fast 3D graphics on the 64.

The object file for this article is available in "Hacking the Code" (Reference: code, SubRef: polycode) found elsewhere in this issue.

C= Hacking Home | Issue Contents

Commodore, CBM, its respective computer system names, and the CBM logo are either registered trademarks or trademarks of ESCOM GmbH or VISCorp in the United States and/or other countries. Neither ESCOM nor VISCorp endorse or are affiliated with Commodore Hacking.

Commodore Hacking is published by:

Brain Innovations, Inc.
10710 Bruhn Avenue
Bennington, NE 68007

Last Updated: 1997-03-31 by Jim Brain
Last Accessed: 1997-06-18 from buttons.ihug.co.nz using Mozilla/3.0
Statistics: 15 accesses today and 291 accesses since 1997-04-29