@(#): Polygonamy: A Study in 3 Dimensions by Stephen L. Judd (sjudd@nwu.edu) We've been making some pretty small potatoes for a while now, so the time has come for a little more ambition and challenge. I decided to think up a real challenge, containing problems that I had no idea how to solve, and see what I could come up with. I set out to create a 3D virtual world for the C64, e.g. a space populated with various three-dimensional objects, which I could wander around in. I wanted it to be full-screen 320x200 hires bitmapped. Furthermore, I wanted the objects to be solid, and since there are only two colors I wanted to be able to put patterns on the faces. I also wanted it to translate nicely to the 128's VDC chip, in 2MHz mode. Finally, naturally, I wanted the program to be fast. This was the framework in which I placed myself, and a few other ideas presented themselves along the way. The outcome of all of this is Polygonamy. Just a brief history of this project: I have wanted to do a 3D world for a very long time, and have been thinking about it for some time in the back of my head; my imagination was probably first fired the first time I played Elite. I wrote down the necessary equations one afternoon last summer, for a high school student I was teaching, and the equations are very simple. I took a break to get some work of measureable value accomplished, but in October I began work on the graphics algorithm. I worked steadily on this for two months, and in December I finally began to code the graphics. In mid-January, I got them to work. Adding the rest took a few weekends more. I have about 128 pages of notes, analytical calculations, and BASIC test programs in my C64 notebook (which is, I think, a nice number of pages to have :). My original plans were to place five objects in the world, but time and memory constraints whittled that down to three. One of my disks self-destructed about the day I was ready to finish, so I had to reconstruct a bunch of tables, but other than that I finally managed to finish it up, albeit with a few rough edges ;). Although the concepts from previous articles are used as a solid foundation, the code is almost 100% from scratch -- I only borrowed a tiny piece of code from an earlier program, and even that I modified somewhat. One caveat before we begin: I am primarily interested in the challenge of designing the algorithms, which means I like to come up with my own solutions. Thus, you may find more efficient methods in a graphics book or perhaps in someone else's code; I have examined neither, so I have no idea what the relative merit of my algorithms may be. These are simply my solutions to challenges placed before me. And if you know of a better way to do things, please feel free to email me! Furthermore, I consider the code a test of the theory. Some of my assumptions work and some do not, and these will be considered at the end of this article. Finally, I am not including the source code. For one thing, it is big. Like, _HUGE_, man. I had to split it up when I ran out of editor memory on my 128 (which, incidentally, forced me to figure out Merlin 128's very cool and very powerful linker feature). I will include numerous code fragments in assembly and BASIC which demonstrate all the important concepts. By the way, if you are interested in measuring frame rates, you can use the first object. Every full 360 degree revolution is 128 frames. So time how long it takes to complete a full rev (or maybe several), and divide that number into 128, to get an idea of frames per second. For a rundown of frame rates for stock and SuperCPU operation, see "Underneath the Hood of the SuperCPU" (Reference: cmdcpu) found elsewhere in thi issue. Some brief acknowledgements: This project would not have happened without the extremely powerful macro and linking capabilities of the Merlin 128 assembler, by Glen Bredon. It would have been _really_ tough without JiffyDOS and my FD-2000, from CMD. I used my Action Replay extensively for debugging, and without the excellent documentation for the 64, such as the PRG and Mapping the 64, this would have been a nightmare. Finally, I must acknowledge my friend George Taylor; a few days before I was all finished I explained some routines to him, and he made a great suggestion which made my fast fill routine blaze. Okay, WAY too much talk. There are a ton of issues involved with this project so let's just wade in hip-deep and deal with them as they come. @(A): The Equations ------------- First some relaxing abstraction. In previous articles we have discussed how to project an object in three dimensions through the origin into a plane. We have also discussed rotations in three dimensions. In principle, then, we have all the mathematics we need to do a 3D world. But we should be thoughtful. Let's say we're standing in the world and turn to the right; we can either rotate ourselves, and change the direction we are looking, or we can rotate the world around us, so that we are always looking 'forward'. This may bother you on physical grounds, but the two are mathematically equivalent. Given the way we have derived our projection routines, it should be clear that we want to rotate the objects in the world around us. (Or, to put it another way, we are at the center of the world, and the world revolves around us.) We have another issue: how do we know when an object is visible or not? How do we know when we've bumped into an object (or blown it out of the sky :)? Moreover, if we have ten objects, and each object has six points, it would be a real drag to have to rotate all sixty points, especially if none of the objects were even visible. It should be clear that we really want to define every object relative to some center of the object. So we keep track of the center of each object, and rotate and translate the centers, and only calculate the full object if it is visible. We of course want to define the object relative to this center. What happens to this center when we translate or rotate the world? Let's simplify our model a little bit and only deal with rotations about one axis, e.g. we are driving a tank and can move forwards and backwards, and turn left or right. The generalization to other axes is very straightforward, but this way we can think in two dimensions instead of three. First we need to agree on a coordinate system. For my system I let the x-axis go _up_ a page of paper, the y-axis comes up out of the page, and the z-axis goes from left to right. Thus, I am standing on the paper in the x-z plane, at the origin, with the y-axis extending upwards from me. If you still don't understand my orientation, draw a little picture. I am going to choose my orientation so that I am always looking straight down the x-axis, e.g. in the direction that x is increasing. Thus, if I walk forwards or backwards, this corresponds to decreasing or increasing the x-component of the center coordinate: let C=(cx,cy,cz) move forwards => cx=cx-1 move backwards=> cx=cx+1 So far so good. As always, draw a picture if you can't visualize it. That takes care of translations, what about rotations? We certainly know how to rotate points about the origin. In particular, if we have a point with coordinates (x1,z1) and rotate it clockwise by an angle s, we get the new point as follows: (x1,z1) -> (x1*cos(s)+z1*sin(s), z1*cos(s)-x1*sin(s)) So that's easy enough. The problem is that we have this big object sitting around this point, and need to figure out what to do with it! Consider the following: let's say we have a line out some distance from the origin, X X | X z-axis ----O------------c---- c=center | X | X Origin X <---- line and we rotate it by some amount theta about the origin: X XX Xc c=rotated center | XX z-axis ---O------------XX- | X XX You can see (from my incredible ASCII artwork) that the line is now at an angle with respect to the origin. Imagine that we draw a line from the origin to the center of the point, in the first picture (or get a pencil and paper and actually do it), so that we have the letter "T" laying on its side. Now we rotate this "T" by some angle theta, so that the top of the "T" -- our line -- has now been rotated. The stem of the "T" meets the top of the "T" at the center point c. Drop a line from the rotated center straight down to the z-axis, and call this line l2. Since the T is at a right angle, and we have rotated it by an angle theta, the angle between our line and the z-axis is 90-theta. But this means that the angle between our line (the top of the "T") and the line l2 is just theta. Thus, if we rotate the center by an amount theta, all we have to do is rotate the object by an amount theta *about the center*, and our perspectives will all be correct. How is that for simple? It should be clear now that this works no matter where the "center" of the object is chosen. Thus, our center is not some physical center of the object, but rather the center of rotation of the object. Since this is true of rotations about any axis, we now know how to generalize to higher dimensions. Note further that we can now implement local rotations -- that is, let's say the object is a tank, and this tank turns independently of whether or not we turn. Piece of cake. You can also see that the rotations are cumulative. If we turn to the left, and then turn left again, we can simply apply two rotations to the points of the object. In fact, if we turn left, move forwards, and then move left again, we still apply just two rotations to the points of the object; the center, however, has moved. This is quite important, as it allows us to measure an object's relative rotation to us, whether or not it is visible. Remember that we only want to rotate the points that define an object when the object is visible. We never actually change the points which define an object. Instead, we track how much the object needs to rotate, and rotate the original points by that amount. The center of the object will change with each rotation and translation. We never change how we define an object about this center, though. We simply apply a rotation to the original points when appropriate. The object centers must be kept track of because they can undergo translation as well as rotation. To summarize then: we define each object relative to a center of rotation. The center determines where the object is located in the world, and allows us to operate on centers when we need to rotate or translate the world. It also lets us perform local operations on an object, so that we could, for instance, have a spinning cube located inside our world. If an object's center is visible then we can consider the object to be visible, and plot it. Whoops -- what does it mean to be 'visible'? Well, think about yourself. You can see things when they are in front of you. Or, to be more precise, you have a field of vision. Perhaps a decent model of this would be a cone. I think a better model, certainly one which is easier to deal with computationally, is the intersection of two planes: a pyramid. Anything which lies within this pyramid we consider visible, and anything which lies outside we consider not visible. Two-dimensionally, we draw a little wedge extending from the origin. Anything within the wedge we count as visible, and anything outside of it we count as not visible. The sides of this wedge are two lines, with equal but opposite slope (i.e. slope=+/-m). \ Visible / \ View / Outside of visual area \Area / \ / \ / * <--- Me Probably lines at some angle are more reasonable than others. But I'm a simple guy, and the two simplest lines I can draw are at 45 degree angles from the axis, so their slope is +/-1. Thus, any points which lie between the lines x+z=0 and x-z=0 are visible. If the center of an object is within this area, we will consider the object visible. That is, if cx+cz>0 and cx-cz>0 the object is visible. One last thing: if we are too close to the object, we either want to bump into it (i.e. not move) or else not display it. So we also need to check if cx"" IF a$="[" THEN cx=cx-20 IF a$="/" THEN cx=cx+20 IF a$=";" THEN GOSUB rotl IF a$="'" THEN GOSUB rotr IF cx=dy then x=x+1:xrem=xrem-dy As usual, we want to start xrem at dy/2, which has the effect of rounding numbers up. 10 REM LINE ROUTINE TAKE TWO SLJ 11/24/95 12 REM ACTUALLY IT'S A FILL RTY NOW 15 GRAPHIC 1,1 20 X0=160:Y0=100 30 X1=5:Y1=-50:X2=7:Y2=11:XL=X1:YL=Y1:Y=Y1 35 X3=50:Y3=Y1:X4=X3+100:Y4=Y2:XR=X3:YR=Y3 40 D1=Y2-Y1+1:DX=X2-X1:LI=INT(DX/D1):LR=DX-LI*D1 45 TL=INT(D1/2) 46 D2=Y4-Y3+1:DX=X4-X3:RI=INT(DX/D2):RR=DX-RI*D2 48 TR=INT(D2/2) 50 DRAW1, X0+X1,Y0-Y1 TO X0+X2,Y0-Y2, X0+X3,Y0-Y3 TO X0+X4,Y0-Y4 60 REM MAIN LOOP 70 XR=XR+RI:TR=TR+RR:IF TR>=D2 THEN TR=TR-D2:XR=XR+1 75 DRAW1, X0+XL,Y0-Y TO X0+XR,Y0-Y 80 XL=XL+LI:TL=TL+LR:IF TL>=D1 THEN TL=TL-D1:XL=XL+1 90 REM DRAW1, X0+XL,Y0-Y TO X0+XR,T0-T 100 Y=Y+1:IF Y<=Y2 THEN 60 In this program (x1,y1)-(x2,y2) is the left line, and (x3,y3)-(x4,y4) is the right line. The first thing to note is that in lines 40 and 46, Y=y2-y1+1. This issue was discussed in the very first C=Hacking 3D-graphics article. The problem is that although the line will be anatomically correct with DY=y2-y1, it will look silly. The easiest way to see this is to consider y2-y1=1, e.g. say we draw a line between (0,10) to (50,11). Ideally this line will consist of two line segments, one from (0,10) to (25,10) and the other from (26,11) to (50,11). But ifi we use DY=1 we will have one line segment from (0,10) to (50,10), and a single point at (50,11). Adding one to DY is just a simple cheat. Most of the time the lines will look just fine, but lines which have a slope near one will come out a bit wrong. The other, accurate solution, which was used in the first article, is more complicated to implement in this routine. Adding one to DY will also have a useful benefit which we shall shortly see. In line 50 above the boundaries of our object are drawn in, to check the accuracy of the algorithm. In lines 70-80 the right point is updated, then the thing is filled, then the left point is updated. This is because both lines are moving to the right, e.g. they both have positive slope. Think about how the line segments will be drawn; in general, we want to draw from the left end of the left line segment to the right end of the right line segment. (Sometimes this will look a little off where the two lines meet). Since the left and right lines can each have either positive or negative slopes, there are four possibilities: Plus-Plus, Plus-Minus, Minus-Minus, and Minus-Plus. Plus-Plus: Update right, fill, then update left Plus-Minus: Fill, update left, update right Minus-Minus:Update left, fill, update right Minus-Plus: Update left, update right, fill If this is still confusing, try out the above program with various left and right line segments, and these things will jump right out. Now we need to think about implementing this in assembly. Since this is being done in hires 320x200, the x-coordinate requires two bytes, and the y-coordinate requires one. We also need another byte to store the remainder portion of the x-coordinate. The most glaring question is the calculation of dx/dy: somehow we need a fast way of exactly dividing an eight bit number dy into a nine bit number dx. Recall that we always add one to dy, so that dy actually ranges from 2 to 200. Since the maximum value of dx is 320 or so, the largest value of dx/dy that we can have is 320/2 = 160. In other words, both the integer and remainder part of dx/dy will fit in a byte. Simply adding one to dy makes life pretty easy at this end. One very fast method of doing division is of course by using logarithms. But they have a problem with accuracy. One the other hand, one thing we know how to do very quickly is multiplication. This then is the plan: use logarithmic division to get an estimate for N, the integer part. Then calculate N*dy, compare with dx, and adjust the integer part accordingly. A quick reminder of how logarithms can be used for division: log(a/b) = log(a) - log(b) exp(log(x)) = x thus a/b = exp(log(a)-log(b)). How do we take the log of a 9-bit number? We don't. Instead, we construct a table of f(x)=log(2*x), and use, not x, but x/2, as a parameter. Remember that the logarithms merely give an estimate to the integer part. Moreover, if the tables are constructed carefully we can insure that the estimate for N is either exact or too small. Thus we only need to check for undershoots, which simplifies the calculation considerably. In particular, the tables were constructed as follows: 10 DIM L1%(160), L2%(200), EX%(255): C=255/LOG(160) 20 FOR I=1 TO 160 30 L1%(I)=INT(C*LOG(I)) 40 NEXT 50 FOR I=2 TO 200 60 L2%(I)=INT(C*LOG(I/2)+0.5) 70 NEXT 80 FOR I=0 TO 255 90 EX%(I)=INT(EXP(I/C)) 95 IF(I=129)OR(I=148)OR(I=153)OR(I=81)OR(I=98)THEN EX%(I)= EX%(I)-1 100 NEXT 110 L2%(3)=L2%(3)+1 The constant C is needed obviously to improve accuracy (log(160) simply isn't a very large number). Note that I divided the arguments of the logarithms in half; instead of calculating 2*dx/dy I calculate dx/(dy/2), which is of course the same thing. This was done to make C work out. By 'fixing' the tables in this manner, exactly 3927 calculations will undershoot, which works out to about 6% of all possible calculations we may perform. The actual division routine works out pretty slick in assembly: DIVXY MAC ;Macro to compute 2*X/Y LDA LOG1,X ;This is the division part SEC SBC LOG2,Y BCS CONT LDX #00 ;dx/dy < 1 LDA ]1 ;LDA dx, since dx is exactly the remainder BCC L2 CONT TAX LDA EXP,X TAX ;X is now integer estimate STA MULT1 EOR #$FF ADC #00 ;Carry is guaranteed set STA MULT2 LDA ]1 ;ldxlo or rdxlo (i.e. low byte of dx) ADC (MULT2),Y SEC SBC (MULT1),Y ;Calculate remainder L2 CMP ]2 ;ldy or rdy (i.e. ]2 = dy) BCC DONE L1 INX ;Remainder is too large, so up int estimate SBC ]2 ;and subtract dy CMP ]2 ;Repeat until remainder= 128 then we have overflow if x+y < -128 then we have underflow which implies: xo+yo >= 256+128 implies overflow xo+yo < 128 implies underflow with similar results for subtraction. Note also that after every addition or subtraction 128 needs to be either added or subtracted from the result, which either way corresponds to an EOR #$80. So it's a little more work to add numbers in this system. (Of course, adding normal numbers to excess-128 numbers is no problem, so INC and DEC work fine). Back to rotations. The most obvious thing to do is to create two tables: f(x) = (x-128)*cos(delta) + 128 g(x) = (x-128)*sin(delta) + 128 but remember that the remainders are also needed: fr(x) = 256*(remainder((x-128)*cos(delta))) gr(x) = 256*(remainder((x-128)*sin(delta))) Since remainders are always positive none of this excess-128 junk is needed. Note that we could also let f(x) and g(x) be 2's-complement tables, then convert from two's-complement into excess-128 after performing additions etc. The conversion is, what do you know, EOR #$80. This is the smarter thing to do, and an even smarter thing to do is to let the cosine table (f(x) above) to be in excess-128 format, and the sine table g(x) in 2's complement. This way the numbers can be added as normal, and no conversion need take place: * Compare to BaSiC subroutine rotl: above ROTL INC THETA LDY #NUMOBJS ;Y indexes the object DEY :LOOP LDX CX,Y ;center coordinate LDA CDEL,X ;CDEL = f(x) above LDX CZ,Y CLC ADC SDEL,X STA TEMP ;t1 = ci+si LDA CXREM,Y CLC ADC SDELREM,Y ;Add remainders BCC :CONT1 INC TEMP CLC :CONT1 LDX CX,Y ADC CDELREM,X BCC :CONT2 INC TEMP :CONT2 STA CXREM,Y LDX CZ,Y LDA CDEL,X LDX CX,Y SEC SBC SDEL,X STA TEMP2 ;t2=cz-si LDA CZREM,Y SEC SBC SDELREM,X BCS :CONT3 DEC TEMP2 :CONT3 LDX CZ,Y CLC ADC CDELREM,X BCC :CONT4 INC TEMP2 :CONT4 STA CZREM,Y LDA TEMP2 STA CZ,Y LDA TEMP STA CX,Y DEY BPL :LOOP Well, that takes care of two lines of BASIC code :). As it turns out, using a single byte for the remainder does a pretty good job of holding the number. Rotating by 360 degrees one way, then rotating back again, produces a center which is within a few decimal places of the starting value. Next up: projections. The projection calcuation is: Proj(P) = d*(P+C)/(px+cx) where P=(px,py,pz) and C=(cx,cy,cz). In terms of the implementation, we want to calculate: d*((P-128) + s*(C-128)) / ((px-128) + s*(cx-128)) where s=20, to translate C into the 'real world'. To calculate this, consider the following function: g(x) = r*d / (s*((px-128)/s + cx-128)) + 128 where r is some scaling factor. The projection calculation then becomes: 1/r*( (g-128)*(P-128) + s*(g-128)*(C-128) ) Thus we need some more tables, one of 1/(4r) * (256-x)^2, the other of s/(4r) * (256-x)^2, to do the multiplication. Furthermore a table of (x-128)/s would be pretty handy, and finally we need a table of g(x) = r*d/(s*(x-128)) + 128. The general outline of a program would be: Get keypress 1- If translate, then update all cx's (just some INCs and DECs) 2- If rotate left or right, then rotate world 3- Update angles for global & local rotation matrix (e.g. theta) 4- Figure out which objects to display/construct a list 5- Call each object in turn 6- Update bitmap: clear out remaining garbage and swap bitmaps Numbers 1 and 2 are done. In number three, by global matrix I mean the object rotation that results from us turning left or right. By local rotation I mean rotations independent of whether or not we turn. The local rotation allows e.g. the octahedron to spin around in Polygonamy. Figuring out which objects to display is easy: just check to make sure it lies within the viewing cone/pyramid, that we are not too close, etc. If an object is to be displayed, it needs to be placed in a list. I constructed the list to make sure that objects which are farther away are drawn first; that way objects can overlap one another correctly. This was done via a simple insertion sort -- i.e. bump up objects in the list until the right spot is reached to insert the object. We have most of the tools to deal with #5. Handling an object consists of rotating and projecting it, then displaying it. Rotation is the same as it has always been, albeit now involving sixteen bits, and projection is described above. Then each polygon needs to be drawn, by sticking the points of the polygon into the polygon list, setting up the fill pattern and the pointer to the minimum Y-value, and calling the polygon fill routine. Of course, if the face is hidden then we certainly don't want to plot it. The minimum y-value can be found very easily while inserting the points into the point list -- just keep track of ymax and compare to each point as it is inserted. We have discussed several methods of calculating hidden faces -- cross-product, rotated normal, parallel faces -- each of which involves looking at a vector normal to the face, and either projecting it or taking the dot product with a vector going to the origin. What a big pain in the butt, especially since values can be sixteen-bits, etc. Did you ever stop to wonder about what happens to all the previous polygon-fill calculations if the point-list is entered in reverse order? Quite simply, left -> right and right -> left. And what happens when a face is invisible? The polygon is turned away from our eye. The points in the polygon, which go counter-clockwise around the polygon, will go clockwise when the polygon is turned around. (I should point out that at least in my code the points on the polygon are actually done in clockwise-order, since projection reverses the points). So, we have hidden-faces already built-in to the polygon plot routine! In essence, we simply don't plot any polygon which the routine will freak out on. We can of course be systematic about this; within the plot routine: - Calculate the left and right lines. - Take a trial step along the left and right lines - If xleft < xright then we are OK, otherwise punt. In principle we only need to do this on the first calculation, and use some properties of the lines to make things easier (for instance, if the left line is moving left and the right line is moving right, and they emanate from the same point, we know the polygon is visible). Unfortunately, nasty situations can arise, for instance when the left and right slopes have the same integer parts. So a check needs to be placed within the fill code to make sure the left point doesn't get ahead of the right point. This is unfortunate, as every cycle counts in the fill code, but luckily there is (was) a natural place to put in a quick check. All that is left then is #6: run through the fill table, and clear any lines that still have old junk in them. Since I used two bitmaps as a double-buffer, all that is left is to swap the bitmaps, and do it all again. Et voila. @(A): Analysis and Conclusions ------------------------ As you can see, the program is not without its flaws. The biggest one, I think, deals with the projection. Recall that I calculate px/s, where s=20, and add it to cx. My feeling was that px was going to be very small compared with cx, and so not modify the projection by much. But either this is a bad assumption, or the rotations are all screwed up, because certain rotations look a bit goofy. For instance, when you walk up close to the octahedron it starts to get jumpy, or wobbly. I note further that when you are far away from an object it looks much better, so that might be a way to fudge around the problem (e.g. make the value of d in the projection much larger). Speaking of rotations, the 'funky shake' which used to plague the old programs has now been fixed. For instance, a rotation in the y-direction would work well but at some point it would appear to start rotating backwards, then start going the right way again. The problem was due to an overflow in the calculation of the rotation matrix, in a term that looked like(sin(t1) + sin(t2) - sin(t3) - sin(t4))/4, and the solution is to split such terms into two, e.g. (sin(t1)+sin(t2))/4 - (sin(t3)+sin(t4))/4. Speaking further of rotations, I find the current system of rotation and projection unsatisfying, in particular too slow. Notice how much the program slows down when all three objects are visible; some 40 points are being rotated, both locally and globally, at this point. It is possible to reduce the number of matrix multiplications from 9 to 8 (and even lower, with lots of extra overhead), but I find this unsatisfying. A better method is needed... There is another bug somewhere in the global rotations which sometimes causes the objects to wander around -- occasionally I can get the ship or the octahedron to move close to the pyramid. Also, when you are really close to an object and turn, you might notice the curious effect of the object rotating by small amounts, and then jumping position by a large amount. This is due to the 'units of 20' that are used in the program; the remainder part of (cx,cy,cz) needs to be used here, and then the display will be smooth as well. Of course, if multicolor mode was used many of the calculations would be much simpler, since the screen x-coordinate would only require one byte instead of two. The program should be made more efficient memory-wise of course. Shoving the fill routines for each buffer together would help out, and a system for rotating points out of a list, similar to that used in the last 3D program, would greatly streamline things (although it would be a tad slower). There is still a minor bug or two in the fill routine, which causes little blue chunks to be taken off the ends of some polygons, but I didn't feel like tracking it down. Note that although Polygonamy only lets you run around in a plane, running around in a full three dimensions is quite simple to add. And, although there are only three objects in the world, it is all set up to deal with a lot more. In summary, I see no major problems standing in the way of doing reasonably fast 3D graphics on the 64. The object file for this article is available in "Hacking the Code" (Reference: code, SubRef: polycode) found elsewhere in this issue. ============================================================================