by Stephen L. Judd (sjudd@nwu.edu)We've been making some pretty small potatoes for a while now, so the time has come for a little more ambition and challenge. I decided to think up a real challenge, containing problems that I had no idea how to solve, and see what I could come up with.
I set out to create a 3D virtual world for the C64, e.g. a space
populated with various three-dimensional objects, which I could wander around
in. I wanted it to be full-screen 320x200 hires bitmapped. Furthermore, I
wanted the objects to be solid, and since there are only two colors I
wanted to be able to put patterns on the faces. I also wanted it to
translate nicely to the 128's VDC chip, in 2MHz mode. Finally, naturally, I
wanted the program to be fast. This was the framework in which I placed
myself, and a few other ideas presented themselves along the way. The
outcome of all of this is Polygonamy.
Just a brief history of this project: I have wanted to do a 3D world for a very long time, and have been thinking about it for some time in the back of my head; my imagination was probably first fired the first time I played Elite. I wrote down the necessary equations one afternoon last summer, for a high school student I was teaching, and the equations are very simple. I took a break to get some work of measureable value accomplished, but in October I began work on the graphics algorithm. I worked steadily on this for two months, and in December I finally began to code the graphics. In mid-January, I got them to work. Adding the rest took a few weekends more. I have about 128 pages of notes, analytical calculations, and BASIC test programs in my C64 notebook (which is, I think, a nice number of pages to have :). My original plans were to place five objects in the world, but time and memory constraints whittled that down to three. One of my disks self-destructed about the day I was ready to finish, so I had to reconstruct a bunch of tables, but other than that I finally managed to finish it up, albeit with a few rough edges ;).
Although the concepts from previous articles are used as a solid
foundation, the code is almost 100% from scratch -- I only borrowed a
tiny piece of code from an earlier program, and even that I modified
somewhat.
One caveat before we begin: I am primarily interested in the
challenge of designing the algorithms, which means I like to come up
with my own solutions. Thus, you may find more efficient methods
in a graphics book or perhaps in someone else's code; I have examined
neither, so I have no idea what the relative merit of my algorithms
may be. These are simply my solutions to challenges placed before me.
And if you know of a better way to do things, please feel free to email
me!
Furthermore, I consider the code a test of the theory. Some of my
assumptions work and some do not, and these will be considered at the end
of this article.
Finally, I am not including the source code. For one thing, it is big.
Like, _HUGE_, man. I had to split it up when I ran out of editor memory
on my 128 (which, incidentally, forced me to figure out Merlin 128's very
cool and very powerful linker feature). I will include numerous code
fragments in assembly and BASIC which demonstrate all the important
concepts.
By the way, if you are interested in measuring frame rates, you can
use the first object. Every full 360 degree revolution is 128 frames.
So time how long it takes to complete a full rev (or maybe several), and
divide that number into 128, to get an idea of frames per second.
For a rundown of frame rates for stock and SuperCPU operation, see
"Underneath the Hood of the SuperCPU" (Reference: cmdcpu) found
elsewhere in thi issue.
Some brief acknowledgements: This project would not have happened without
the extremely powerful macro and linking capabilities of the Merlin 128
assembler, by Glen Bredon. It would have been _really_ tough without
JiffyDOS and my FD-2000, from CMD. I used my Action Replay extensively
for debugging, and without the excellent documentation for the 64, such as
the PRG and Mapping the 64, this would have been a nightmare. Finally, I
must acknowledge my friend George Taylor; a few days before I was all
finished I explained some routines to him, and he made a great suggestion
which made my fast fill routine blaze.
Okay, WAY too much talk. There are a ton of issues involved with this project so let's just wade in hip-deep and deal with them as they come.
First some relaxing abstraction. In previous articles we have discussed how to project an object in three dimensions through the origin into a plane. We have also discussed rotations in three dimensions. In principle, then, we have all the mathematics we need to do a 3D world.
But we should be thoughtful. Let's say we're standing in the world and
turn to the right; we can either rotate ourselves, and change the
direction we are looking, or we can rotate the world around us, so that
we are always looking 'forward'. This may bother you on physical grounds,
but the two are mathematically equivalent. Given the way we have derived
our projection routines, it should be clear that we want to rotate the
objects in the world around us.
(Or, to put it another way, we are at the center of the world, and the
world revolves around us.)
We have another issue: how do we know when an object is visible or not? How do we know when we've bumped into an object (or blown it out of the sky :)? Moreover, if we have ten objects, and each object has six points, it would be a real drag to have to rotate all sixty points, especially if none of the objects were even visible.
It should be clear that we really want to define every object relative
to some center of the object. So we keep track of the center of each
object, and rotate and translate the centers, and only calculate the
full object if it is visible. We of course want to define the object
relative to this center.
What happens to this center when we translate or rotate the world?
Let's simplify our model a little bit and only deal with rotations about
one axis, e.g. we are driving a tank and can move forwards and backwards,
and turn left or right. The generalization to other axes is very
straightforward, but this way we can think in two dimensions instead
of three.
First we need to agree on a coordinate system. For my system I let the x-axis go _up_ a page of paper, the y-axis comes up out of the page, and the z-axis goes from left to right. Thus, I am standing on the paper in the x-z plane, at the origin, with the y-axis extending upwards from me. If you still don't understand my orientation, draw a little picture.
I am going to choose my orientation so that I am always looking straight down the x-axis, e.g. in the direction that x is increasing. Thus, if I walk forwards or backwards, this corresponds to decreasing or increasing the x-component of the center coordinate:
let C=(cx,cy,cz) move forwards => cx=cx-1 move backwards=> cx=cx+1So far so good. As always, draw a picture if you can't visualize it. That takes care of translations, what about rotations?
We certainly know how to rotate points about the origin. In particular, if we have a point with coordinates (x1,z1) and rotate it clockwise by an angle s, we get the new point as follows:
(x1,z1) -> (x1*cos(s)+z1*sin(s), z1*cos(s)-x1*sin(s))So that's easy enough. The problem is that we have this big object sitting around this point, and need to figure out what to do with it!
Consider the following: let's say we have a line out some distance from
the origin,
X X | Xz-axis ----O------------c---- c=center
| X | X Origin X <---- lineand we rotate it by some amount theta about the origin:
X XX Xc c=rotated center | XXz-axis ---O------------XX-
| X XXYou can see (from my incredible ASCII artwork) that the line is now at an angle with respect to the origin.
Imagine that we draw a line from the origin to the center of the point,
in the first picture (or get a pencil and paper and actually do it), so
that we have the letter "T" laying on its side. Now we rotate this "T"
by some angle theta, so that the top of the "T" -- our line -- has now
been rotated.
The stem of the "T" meets the top of the "T" at the center point c. Drop
a line from the rotated center straight down to the z-axis, and call this
line l2. Since the T is at a right angle, and we have rotated it by an
angle theta, the angle between our line and the z-axis is 90-theta. But
this means that the angle between our line (the top of the "T") and the
line l2 is just theta.
Thus, if we rotate the center by an amount theta, all we have to do is rotate the object by an amount theta *about the center*, and our perspectives will all be correct. How is that for simple? It should be clear now that this works no matter where the "center" of the object is chosen. Thus, our center is not some physical center of the object, but rather the center of rotation of the object.
Since this is true of rotations about any axis, we now know how to
generalize to higher dimensions.
Note further that we can now implement local rotations -- that is, let's
say the object is a tank, and this tank turns independently of whether
or not we turn. Piece of cake.
You can also see that the rotations are cumulative. If we turn to the left, and then turn left again, we can simply apply two rotations to the points of the object. In fact, if we turn left, move forwards, and then move left again, we still apply just two rotations to the points of the object; the center, however, has moved.
This is quite important, as it allows us to measure an object's relative
rotation to us, whether or not it is visible. Remember that we only
want to rotate the points that define an object when the object is
visible. We never actually change the points which define an object.
Instead, we track how much the object needs to rotate, and rotate the
original points by that amount.
The center of the object will change with each rotation and translation.
We never change how we define an object about this center, though. We
simply apply a rotation to the original points when appropriate. The
object centers must be kept track of because they can undergo translation
as well as rotation.
To summarize then: we define each object relative to a center of
rotation. The center determines where the object is located in the
world, and allows us to operate on centers when we need to rotate or
translate the world. It also lets us perform local operations on an
object, so that we could, for instance, have a spinning cube located
inside our world. If an object's center is visible then we can consider
the object to be visible, and plot it.
Whoops -- what does it mean to be 'visible'? Well, think about yourself. You can see things when they are in front of you. Or, to be more precise, you have a field of vision. Perhaps a decent model of this would be a cone. I think a better model, certainly one which is easier to deal with computationally, is the intersection of two planes: a pyramid. Anything which lies within this pyramid we consider visible, and anything which lies outside we consider not visible.
Two-dimensionally, we draw a little wedge extending from the origin. Anything within the wedge we count as visible, and anything outside of it we count as not visible. The sides of this wedge are two lines, with equal but opposite slope (i.e. slope=+/-m).
\ Visible /
\ View / Outside of visual area \Area /Probably lines at some angle are more reasonable than others. But I'm a simple guy, and the two simplest lines I can draw are at 45 degree angles from the axis, so their slope is +/-1. Thus, any points which lie between the lines x+z=0 and x-z=0 are visible. If the center of an object is within this area, we will consider the object visible. That is, if cx+cz>0 and cx-cz>0 the object is visible.
* <--- Me
One last thing: if we are too close to the object, we either want to
bump into it (i.e. not move) or else not display it. So we also need to
check if cx
We are now in a position to write some simple code. I wrote the
following in evil Microsoft QBasic, but BASIC7.0 on the 128 would work
just as well, although you need to change the variables (I didn't have my
128 handy, otherwise I would have written this on the 128):
delta= 3 'Rotations will be in 3 degree increments
rad= 3.1415926/180
cdel= COS(rad*delta)
sdel= SIN(rad*delta)
theta=0
d=-135
x0= 60 'Bumped into the object?
REM z0 y0 would be 160,100 to place the object in the center
REM of the screen
y0= 170 'I want the bottom of screen to be ground
z0= 160
REM Set up the object
REM Tetrahedron: 0,sqrt(3),0 0,0,1 0,0,-1 2,0,0
DIM obx(4), oby(4), obz(4)
obx(1)= 0
oby(1)= 50*SQR(3)
obz(1)= 0
obx(2)= 0
oby(2)= 0
obz(2)= 50
obx(3)= 0
oby(3)= 0
obz(3)= -50
obx(4)= 100
oby(4)= 0
obz(4)= 0
cx= 100
cy= -10
cz= 0
REM Get input
main:
DO
a$=INKEY$
LOOP UNTIL a$<>""
IF a$="[" THEN cx=cx-20
IF a$="/" THEN cx=cx+20
IF a$=";" THEN GOSUB rotl
IF a$="'" THEN GOSUB rotr
IF cx
p1x= cx + ctheta*obx(1) + stheta*obz(1) 'Rotate and add to center
p1y= cy + oby(1)
p1z= cz + ctheta*obz(1) - stheta*obx(1)
p1y= y0 + d*p1y/p1x 'Project and add offset
p1z= z0 + d*p1z/p1x
[... similar for p2x,p2y,p2z,...,p4x,p4y,p4z]
CLS
LINE (p1z, p1y)-(p2z, p2y)
[... lines between p2-p3, p3-p1, p1-p4, p4-p2, p4-p3]
GOTO main: 'Main loop
REM rotate left
rotl:
theta= theta + delta
blah= cdel*cx + sdel*cz
cz= -sdel*cx + cdel*cz
cx= blah
RETURN
rotr:
theta= theta-delta
blah= cdel*cx - sdel*cz
cz= sdel*cx + cdel*cz
cx= blah
RETURN
So much for the easy part.
If there is one thing that the previous programs have taught us, it is
that graphics are slow. At least, they are far and above the major thing
slowing down the program, and deserve the most attention and thought for
improvement. Moreover, because there is lots of looping involved, the
elimination of just a few instructions can translate to thousands of
cycles saved.
We have examined several fill routines up to now, but neither of them is
up to the task of Polygonamy. The cookie-cutter method is OK, but doesn't
allow multiple objects, and certainly doesn't allow pattern fills. Using
an EOR-buffer is just plain slow and inefficient and a big drag. So it's
time to rethink this problem.
Recall that on the 64 the bitmap screen is divided into 8x8 cells, which
are arranged horizontally in rows. It's a pretty kooky way of doing
things, but we shall overcome.
First of all it should be clear that we want to fill from left to right
(as opposed from top to bottom). We can then fill a byte at a time,
instead of dinking in little pixels at a time.
Previously we used a custom character set to plot into. One of the major
reasons for doing so was to use Y as the Y-coordinate, so that storing a
particular point was as simple as STA COLUMN,Y. We can still use this
idea, but only within each row. That is, if we let Y=0..7, we can address
each individual pixel-row within each 8x8 block row with an STA ADDRESS,Y.
For real speed, we are going to want an unrolled fill routine. That is,
we don't want to mess around with loop counters and updating pointers and
such. Since there are 25 rows on the screen (25 times 8 = 200 pixels
high) we are probably going to need 25 separate fill routines.
I constructed my fill routine as follows:
There is a little problem though -- what we're talking about doing is an
indirect JSR, and there is no such thing. But it's easy enough to fake,
because we can use an indirect JMP. So a call to the fill routine would
look like the following:
where ADDRESS simply points to the correct entry point in the fill routine.
Moreover, you may also note that 40 columns times 6 bytes/column is 240
bytes, so that each little fill routine handily fits in a page. Thus,
moving between rows in the bitmap corresponds to a simple decrement or
increment of the high byte of the ADDRESS pointer.
This was the state of things when, days before I was to be all done with
Polygonamy, I mentioned it to my friend George Taylor, who suggested the
following modification: instead of using X to count how many columns to
fill, just make the fill routine:
I don't think you're going to make a fill routine faster than that :).
Moreover, note that each fill routine takes up just 120 bytes, so we can
now fit two fill routines in each page. I did not do this, but it is
easy to do, and instantly frees up 25 pages.
I mean, hey, this _is_ "Polygonamy", so let's talk polygons, and lots of
them.
Clearly all that is needed to draw an object are the left and right
endpoints of the object, since everything in-between will be filled.
An observation to make is that if you take a slice out of a convex
polygon, the slice will intersect the polygon at exactly two points.
Another, more important, observation is to note that the highest and
lowest point of a polygon will always be at a vertex. Finally, it is
important to note that any vertex of a polygon has exactly two lines
extending out of it, one to the left, and one to the right.
Consider a piece of a polygon:
The idea then is to start at the bottom (or the top) and to steadily move
upwards while _simultaneously_ calculating the endpoints of the left and
right lines, and filling in-between. But we need the equations of the
left and right lines to do this.
Now it's time for another observation. Let's say we have a polygon with
n vertices v1, v2, ..., vn, and furthermore that as we move between these
points we move around the polygon counter-clockwise. Thus v3 is to the
right of v2, v1 is to the left of v2, v4 is to the right of v3, etc.
For example:
v2
v3
Now we have a real plan. We simply define the polygon as a list of
points v1 v2 v3 ... vn. We then figure out which one is lowest, i.e. has
the smallest (or greatest) y-coordinate, call this vertex vm (vmax). The
endpoints of the left and right lines are vm-1,vm and vm,vm+1. So move
along those lines until the next vertex is reached. At that point,
recompute the appropriate line, and keep moving upwards until the top of
the polygon is reached.
Perhaps an example would be helpful:
| \ v3
v2
It is important to keep in mind that the order of the points never
changes. We don't need to do anything complicated like sorting the
points; we only need to find the lowest point, and branch left and right
from there, keeping in mind that the points are cyclic (i.e. v1 is to the
right of vn).
It is now time to start thinking about code. One aspect of the fill
routine we haven't considered is the clear. In the past the entire draw
buffer was cleared and then the new stuff was drawn into it. But this
seems like a bit of a waste; it seems wasteful to clear a bunch of memory
that is just going to be overwritten again. So, as long as we can do it
efficiently, it might be smart to combine the clear and fill routines.
Here is how Polygonamy does it: If a line needs to be cleared, then it
is cleared up to the edges of the object, but the part that is going to
be filled is ignored. (It isn't clear if this provides any substantial
efficiency gains, though).
To see the status of a particular line, a table is used, containing a
value for each Y-coordinate. If the entry is 255 then the line is clear,
if it's 0 then the line has old junk in it, and if it's 1 then the line
has new junk in it. Thus we only clear the line if its entry in the fill
table is a zero.
So a fill routine might flow like the following:
If we are at the left endpoint then recalculate the left line.
If we are at the right endpoint the recalculate the right line.
Update xleft & xright
If line needs to be cleared then clear line.
If the starting fill column is different than the previous fill
column then update the pointers, etc.
Plot the left and right endpoints (since the fill routine only plots
eight bits at a time)
Fill the in-between parts
Update Y
If Y passes through zero then update fill entry point, set Y=7 etc.
Keep going until we reach the top
The question is: if the y-coordinate increases by one, then how much does
the x-coordinate increase by? The equation of a line is:
Isn't this a fraction? Sure, big deal. The fraction can be written
as dx/dy = N + Rem/dy, where N is an integer, and Rem is the remainder,
which is always less than dy. So to calculate x=x+dx/dy:
Adding one to DY is just a simple cheat. Most of the time the lines will
look just fine, but lines which have a slope near one will come out a bit
wrong. The other, accurate solution, which was used in the first
article, is more complicated to implement in this routine. Adding one to
DY will also have a useful benefit which we shall shortly see.
In line 50 above the boundaries of our object are drawn in, to check the
accuracy of the algorithm.
In lines 70-80 the right point is updated, then the thing is filled, then
the left point is updated. This is because both lines are moving to the
right, e.g. they both have positive slope. Think about how the line
segments will be drawn; in general, we want to draw from the left end of
the left line segment to the right end of the right line segment.
(Sometimes this will look a little off where the two lines meet).
Since the left and right lines can each have either positive or negative
slopes, there are four possibilities: Plus-Plus, Plus-Minus, Minus-Minus,
and Minus-Plus.
Now we need to think about implementing this in assembly. Since this is
being done in hires 320x200, the x-coordinate requires two bytes, and the
y-coordinate requires one. We also need another byte to store the
remainder portion of the x-coordinate.
The most glaring question is the calculation of dx/dy: somehow we need a
fast way of exactly dividing an eight bit number dy into a nine bit
number dx. Recall that we always add one to dy, so that dy actually
ranges from 2 to 200. Since the maximum value of dx is 320 or so, the
largest value of dx/dy that we can have is 320/2 = 160. In other words,
both the integer and remainder part of dx/dy will fit in a byte. Simply
adding one to dy makes life pretty easy at this end.
One very fast method of doing division is of course by using logarithms.
But they have a problem with accuracy. One the other hand, one thing we
know how to do very quickly is multiplication.
This then is the plan: use logarithmic division to get an estimate for N,
the integer part. Then calculate N*dy, compare with dx, and adjust the
integer part accordingly.
A quick reminder of how logarithms can be used for division:
Moreover, if the tables are constructed carefully we can insure that the
estimate for N is either exact or too small. Thus we only need to check
for undershoots, which simplifies the calculation considerably. In
particular, the tables were constructed as follows:
The actual division routine works out pretty slick in assembly:
The end result then is a 9-bit/8-bit divide which takes 52 cycles
in the best case. Pretty neat, huh? And quite adequate for our
purposes.
Wait just a minute there, bub... what about when dy=0? Consider what
dy=0 means: it means that two vertices lie along the same line. That in
turn means that the next vertex can be immediately skipped to. That is,
simply move on to the next point in the list, be it to the right or to
the left, if dy=0.
Well, ah reckon that that just about completes the polygon fill routine.
To summarize: start at the bottom (top, whatever) of the polygon.
Calculate the "slopes" of the right and left lines from that point.
Update the coordinates, fill the in-between parts, and plot the
end-sections. Then update Y and keep going. If another vertex is hit,
then recalculate the corresponding line.
Alert people may have noticed that this algorithm translates very nicely
to the 128's VDC chip.
I should probably briefly mention the pattern fills. I use Y as an index
to a pattern table, so it was very natural to use 8x8 character patterns.
With different indexing of course more complicated patterns can be used.
Moreover, it dawned on me that animated patterns were just as easy as
normal ones, so I tried to think up a few interesting animated patterns
(there are two in Polygonamy, each pattern is eight frames).
So that's the graphics part, more or less.
We ain't even CLOSE to being done yet.
Now it's FINALLY time to start writing the master program to control the
3D world. Luckily we have the BASIC program from waaaay up above to
work from.
First is to decide how angles will be measured. The smart thing to do is
to let the angle variable vary between 0..127 or 0..255; that is, to
measure angles in units of 2*pi/128 (or 2*pi/256). The reason this is
smart is because the angle is now periodic, wrapping around 256. Angles
can be added together without checking for overflow, etc. (257=1, 258=2,
259=3, etc.). Note that in previous programs I did a very dumb thing
and let the angle variable vary from 0..119, so angles were measured in
three degree increments, and I had to place all sorts of checks into the
code. Polygonamy uses angle increments of delta=2*pi/128.
Next there is the issue of cx=cx+20 instead of cx=cx+1. The problem is
that if cx=cx+1 is used it takes forever to move around in the world.
Moreover, the objects get really small at around cx=5000. What this
means is that in the assembly version we can use a single byte for cx,
and just treat each unit of cx as 20 "real world" units. That is, in
the assembly program, we will keep track of cx/20 instead of cx.
Sort-of.
Consider the rotation which takes place when we turn left or right: the
world is rotated through an angle delta=2*pi/128, so the calculation is:
Thus the centers need to be calculated more accurately. In particular, a
second byte is needed to store the 'decimal' part of the center. To be
precise, this second byte will contain the decimal part of the center
times 256. This way we can add and subtract remainders and any over- or
underflows will then affect the integer parts cx,cy,cz.
Very quickly we should decide how to represent remainders of negative
numbers. A number like -1.5 can be represented as -1 - 0.5, but it can
just as well be represented as -2.0 + 0.5. By using the second method
remainders are always positive, and that's the smart way to do things (if
nothing else it lets the remainder be a fraction of 256, instead of a
fraction of 128). It's also the way any computer will round: type
INT(-1.5) and see what happens.
further question arises about how to represent the centers, specifically,
how do we represent an object which is behind us, e.g. has a negative
value for cx. The normal way to represent negative numbers is of course
to use 2's complement notation, but this has some disadvantages. One of
them is multiplication: recall that in an earlier code some really fancy
footwork needed to be done just to be able to multiply numbers between
-64..64, and we certainly want the centers to range over more numbers
than that. This gets worse if we decide to use more bits to represent
the centers, as we must do if a larger world is constructed.
Moreover, Polygonamy is an excuse for testing new and different ideas and
investigating their strengths and limitations, so why not try something
different. As I look through my notes I'm not really sure what motivated
this choice, but how about the following: let's add 128 to all of our
numbers.(I think this is called excess-128 notation). -2 will be
represented as 126, -1 will be represented as 127, six will be
represented as 134, etc.
Shifting between the excess numbers and 'real' numbers is as simple as
EOR #128. Recall that to multiply two numbers, let f(x)=x^2/4, so that
a*b = f(a+b) - f(a-b). In the new system:
Now the downside of this method: adding and subtracting excess-128
numbers, and in particular checking for overflow.
Back to rotations. The most obvious thing to do is to create two tables:
ROTL
:LOOP LDX CX,Y ;center coordinate
LDA CDEL,X ;CDEL = f(x) above
LDX CZ,Y
LDX CZ,Y
Next up: projections. The projection calcuation is:
The general outline of a program would be:
Figuring out which objects to display is easy: just check to make sure it
lies within the viewing cone/pyramid, that we are not too close, etc. If
an object is to be displayed, it needs to be placed in a list. I
constructed the list to make sure that objects which are farther away are
drawn first; that way objects can overlap one another correctly. This
was done via a simple insertion sort -- i.e. bump up objects in the list
until the right spot is reached to insert the object.
We have most of the tools to deal with #5. Handling an object consists
of rotating and projecting it, then displaying it. Rotation is the same
as it has always been, albeit now involving sixteen bits, and projection
is described above. Then each polygon needs to be drawn, by sticking the
points of the polygon into the polygon list, setting up the fill pattern
and the pointer to the minimum Y-value, and calling the polygon fill
routine. Of course, if the face is hidden then we certainly don't want
to plot it.
The minimum y-value can be found very easily while inserting the points
into the point list -- just keep track of ymax and compare to each point
as it is inserted.
We have discussed several methods of calculating hidden faces --
cross-product, rotated normal, parallel faces -- each of which involves
looking at a vector normal to the face, and either projecting it or
taking the dot product with a vector going to the origin. What a big
pain in the butt, especially since values can be sixteen-bits, etc.
Did you ever stop to wonder about what happens to all the previous
polygon-fill calculations if the point-list is entered in reverse order?
Quite simply, left -> right and right -> left. And what happens when a
face is invisible? The polygon is turned away from our eye. The points
in the polygon, which go counter-clockwise around the polygon, will go
clockwise when the polygon is turned around. (I should point out that at
least in my code the points on the polygon are actually done in
clockwise-order, since projection reverses the points).
So, we have hidden-faces already built-in to the polygon plot routine!
In essence, we simply don't plot any polygon which the routine will freak
out on. We can of course be systematic about this; within the plot
routine:
All that is left then is #6: run through the fill table, and clear any
lines that still have old junk in them. Since I used two bitmaps as a
double-buffer, all that is left is to swap the bitmaps, and do it all
again.
Et voila.
As you can see, the program is not without its flaws. The biggest one,
I think, deals with the projection. Recall that I calculate px/s, where
s=20, and add it to cx. My feeling was that px was going to be very
small compared with cx, and so not modify the projection by much. But
either this is a bad assumption, or the rotations are all screwed up,
because certain rotations look a bit goofy. For instance, when you walk
up close to the octahedron it starts to get jumpy, or wobbly. I note
further that when you are far away from an object it looks much better,
so that might be a way to fudge around the problem (e.g. make the value
of d in the projection much larger).
Speaking of rotations, the 'funky shake' which used to plague the old
programs has now been fixed. For instance, a rotation in the y-direction
would work well but at some point it would appear to start rotating
backwards, then start going the right way again. The problem was due to
an overflow in the calculation of the rotation matrix, in a term that
looked like(sin(t1) + sin(t2) - sin(t3) - sin(t4))/4, and the solution is
to split such terms into two, e.g. (sin(t1)+sin(t2))/4 -
(sin(t3)+sin(t4))/4.
Speaking further of rotations, I find the current system of rotation and
projection unsatisfying, in particular too slow. Notice how much the
program slows down when all three objects are visible; some 40 points
are being rotated, both locally and globally, at this point. It is
possible to reduce the number of matrix multiplications from 9 to 8 (and
even lower, with lots of extra overhead), but I find this unsatisfying.
A better method is needed...
There is another bug somewhere in the global rotations which sometimes
causes the objects to wander around -- occasionally I can get the ship or
the octahedron to move close to the pyramid. Also, when you are really
close to an object and turn, you might notice the curious effect of the
object rotating by small amounts, and then jumping position by a large
amount. This is due to the 'units of 20' that are used in the program;
the remainder part of (cx,cy,cz) needs to be used here, and then the
display will be smooth as well.
Of course, if multicolor mode was used many of the calculations would be
much simpler, since the screen x-coordinate would only require one byte
instead of two.
The program should be made more efficient memory-wise of course. Shoving
the fill routines for each buffer together would help out, and a system
for rotating points out of a list, similar to that used in the last 3D
program, would greatly streamline things (although it would be a tad
slower).
There is still a minor bug or two in the fill routine, which causes
little blue chunks to be taken off the ends of some polygons, but I
didn't feel like tracking it down.
Note that although Polygonamy only lets you run around in a plane,
running around in a full three dimensions is quite simple to add. And,
although there are only three objects in the world, it is all set up to
deal with a lot more. In summary, I see no major problems standing
in the way of doing reasonably fast 3D graphics on the 64.
The object file for this article is available in "Hacking the Code"
(Reference: code, SubRef: polycode) found elsewhere in this issue.
Polygon Prototype Code
SCREEN 1 '320x200
(You may note that cx=cx+20 is used for a translation, instead of cx=cx+1.
This will be detailed later).
Filling
STA COL1,Y
DEX
BEQ :DONE
STA COL2,Y
DEX
BEQ :DONE
STA COL3,Y
... etc.
:DONE RTS
Thus X would be my counter into the number of columns to fill, A can
contain our pattern to fill with, and Y can range from 0..7 to index the
individual rows within the block. The first thing to notice is that each
STA/DEX/BEQ code chunk is six bytes. So, all we need to do is calculate
which row to start filling at, multiply by six, and add that number to
the start of the fill routine. The idea is then to jump into the correct
place in the fill, and let it fill the right number of columns, stored in
X.
JSR FILL
FILL JMP (ADDRESS)
STA COL1,Y
STA COL2,Y
STA COL3,Y
Then, insert an RTS into the right place in the routine. Thus, we
calculate which column to stop filling at, multiply by three, and stick
an RTS in the fill routine at that point. To fix it up we stick an
STA ..,Y back on top of the RTS.
Filled Polygons
* <--- Vertex v0
where the vertex v0 is the lowest point of the polygon. All that needs
to be done is to move upwards (DEY), compute the left and right points
of the polygon at that point, and then fill between the two (JSR FILL).
v1____v3
What happens if we rotate this polygon?
v2____v1
The vertices have changed position, but *their order has not*. v3 is
still to the right of v2, and v1 is still to the left.
v1
v2 is the minimum. The left line has endpoints (v1,v2) and the right
line has endpoints (v2,v3). We steadily move along the left and right
lines as we creep upwards. At some point we hit v3, and at this point
we compute a new equation for the right line, this time with endpoints
(v3,v1). Now we continue to creep upwards and move along the left and
right lines, until we hit v1, at which point we are finished.
let Y count from 7..0
The next thing to figure out is how to calculate the left and right
lines. We do have the old line routine, which we could use to advance
to the left and right endpoints, but clearly this isn't too efficient.
(y-y0) = m*(x-x0) m=slope
or
change in y = m*change in x
So, if the change in y is 1, then then the change in x is 1/m. All we
need to do then is calculate the inverse-slope=dx/dy, where dx=x2-x1 and
dy=y2-y1, and add this to the x-coordinate with each step in y.
x= x+N
xrem= xrem+Rem
If xrem>=dy then x=x+1:xrem=xrem-dy
As usual, we want to start xrem at dy/2, which has the effect of rounding
numbers up.
10 REM LINE ROUTINE TAKE TWO SLJ 11/24/95
12 REM ACTUALLY IT'S A FILL RTY NOW
15 GRAPHIC 1,1
20 X0=160:Y0=100
30 X1=5:Y1=-50:X2=7:Y2=11:XL=X1:YL=Y1:Y=Y1
35 X3=50:Y3=Y1:X4=X3+100:Y4=Y2:XR=X3:YR=Y3
40 D1=Y2-Y1+1:DX=X2-X1:LI=INT(DX/D1):LR=DX-LI*D1
45 TL=INT(D1/2)
46 D2=Y4-Y3+1:DX=X4-X3:RI=INT(DX/D2):RR=DX-RI*D2
48 TR=INT(D2/2)
50 DRAW1, X0+X1,Y0-Y1 TO X0+X2,Y0-Y2, X0+X3,Y0-Y3 TO X0+X4,Y0-Y4
60 REM MAIN LOOP
70 XR=XR+RI:TR=TR+RR:IF TR>=D2 THEN TR=TR-D2:XR=XR+1
75 DRAW1, X0+XL,Y0-Y TO X0+XR,Y0-Y
80 XL=XL+LI:TL=TL+LR:IF TL>=D1 THEN TL=TL-D1:XL=XL+1
90 REM DRAW1, X0+XL,Y0-Y TO X0+XR,T0-T
100 Y=Y+1:IF Y<=Y2 THEN 60
In this program (x1,y1)-(x2,y2) is the left line, and (x3,y3)-(x4,y4) is
the right line. The first thing to note is that in lines 40 and 46,
Y=y2-y1+1. This issue was discussed in the very first C=Hacking
3D-graphics article. The problem is that although the line will be
anatomically correct with DY=y2-y1, it will look silly. The easiest way
to see this is to consider y2-y1=1, e.g. say we draw a line between
(0,10) to (50,11). Ideally this line will consist of two line segments,
one from (0,10) to (25,10) and the other from (26,11) to (50,11). But ifi
we use DY=1 we will have one line segment from (0,10) to (50,10), and a
single point at (50,11).
Plus-Plus: Update right, fill, then update left
Plus-Minus: Fill, update left, update right
Minus-Minus:Update left, fill, update right
Minus-Plus: Update left, update right, fill
If this is still confusing, try out the above program with various left
and right line segments, and these things will jump right out.
log(a/b) = log(a) - log(b)
exp(log(x)) = x
thus
a/b = exp(log(a)-log(b)).
How do we take the log of a 9-bit number? We don't. Instead, we
construct a table of f(x)=log(2*x), and use, not x, but x/2, as a
parameter. Remember that the logarithms merely give an estimate to the
integer part.
10 DIM L1%(160), L2%(200), EX%(255): C=255/LOG(160)
20 FOR I=1 TO 160
30 L1%(I)=INT(C*LOG(I))
40 NEXT
50 FOR I=2 TO 200
60 L2%(I)=INT(C*LOG(I/2)+0.5)
70 NEXT
80 FOR I=0 TO 255
90 EX%(I)=INT(EXP(I/C))
95 IF(I=129)OR(I=148)OR(I=153)OR(I=81)OR(I=98)THEN EX%(I)= EX%(I)-1
100 NEXT
110 L2%(3)=L2%(3)+1
The constant C is needed obviously to improve accuracy (log(160) simply
isn't a very large number). Note that I divided the arguments of the
logarithms in half; instead of calculating 2*dx/dy I calculate dx/(dy/2),
which is of course the same thing. This was done to make C work out.
By 'fixing' the tables in this manner, exactly 3927 calculations will
undershoot, which works out to about 6% of all possible calculations we
may perform.
DIVXY MAC ;Macro to compute 2*X/Y
LDA LOG1,X ;This is the division part
SEC
SBC LOG2,Y
BCS CONT
LDX #00 ;dx/dy < 1
LDA ]1 ;LDA dx, since dx is exactly the remainder
BCC L2
CONT
TAX
LDA EXP,X
TAX ;X is now integer estimate
STA MULT1
EOR #$FF
ADC #00 ;Carry is guaranteed set
STA MULT2
LDA ]1 ;ldxlo or rdxlo (i.e. low byte of dx)
ADC (MULT2),Y
SEC
SBC (MULT1),Y ;Calculate remainder
L2 CMP ]2 ;ldy or rdy (i.e. ]2 = dy)
BCC DONE
L1 INX ;Remainder is too large, so up int estimate
SBC ]2 ;and subtract dy
CMP ]2 ;Repeat until remainder
Do you see how it works? First the initial guess N is calculated. If
log(x) - log(y/2) is negative then dx/dy is less that one, so the
remainder is simply dx and the integer part is zero. Otherwise,
R= dx - N*dy is calculated. Since N always undershoots, dx-N*dy will
always be positive, so the high byte of dx isn't needed. This quantity
R is the remainder, so if it is larger than dy simply increase the
integer estimate and subtract dy from R, and repeat if necessary.
3D Code
blah= cdel*cx + sdel*cz.
cz= -sdel*cx + cdel*cz
The problem is that sin(2*pi/128)=0.049 and cos(delta)=0.9988, which
means that, in practice, cdel*cx=cx. Equally bad is that sdel*cz is very
small when cz starts to get small (e.g. 10*sdel = 0.49). The result of
this is that objects close to the origin (e.g. us) will not be rotated at
all!
xo = 128+x
yo = 128+y
which means:
xo+yo = 256 + (x+y)
256+xo-yo = 256 + (x-y).
The 256 added above can be thought of as the carry bit. What this means
is that all that is needed is to construct a single function,
f(x) = (x-256)^2
where x=-255..255. We can now very quickly multiply signed numbers in
the range -128..128, and with just a single (albeit 512 byte) table,
using essentially the same multiplication procedure as before.
xo+yo = 256 + (x+y)
if x+y >= 128 then we have overflow
if x+y < -128 then we have underflow
which implies:
xo+yo >= 256+128 implies overflow
xo+yo < 128 implies underflow
with similar results for subtraction. Note also that after every
addition or subtraction 128 needs to be either added or subtracted from
the result, which either way corresponds to an EOR #$80. So it's a
little more work to add numbers in this system. (Of course, adding
normal numbers to excess-128 numbers is no problem, so INC and DEC work
fine).
f(x) = (x-128)*cos(delta) + 128
g(x) = (x-128)*sin(delta) + 128
but remember that the remainders are also needed:
fr(x) = 256*(remainder((x-128)*cos(delta)))
gr(x) = 256*(remainder((x-128)*sin(delta)))
Since remainders are always positive none of this excess-128 junk is
needed. Note that we could also let f(x) and g(x) be 2's-complement
tables, then convert from two's-complement into excess-128 after
performing additions etc. The conversion is, what do you know, EOR #$80.
This is the smarter thing to do, and an even smarter thing to do is to
let the cosine table (f(x) above) to be in excess-128 format, and the
sine table g(x) in 2's complement. This way the numbers can be added
as normal, and no conversion need take place:
Well, that takes care of two lines of BASIC code :). As it turns out,
using a single byte for the remainder does a pretty good job of holding
the number. Rotating by 360 degrees one way, then rotating back again,
produces a center which is within a few decimal places of the starting
value.
INC THETA
LDY #NUMOBJS ;Y indexes the object
DEY
CLC
ADC SDEL,X
STA TEMP ;t1 = ci+si
LDA CXREM,Y
CLC
ADC SDELREM,Y ;Add remainders
BCC :CONT1
INC TEMP
CLC
:CONT1 LDX CX,Y
ADC CDELREM,X
BCC :CONT2
INC TEMP
:CONT2 STA CXREM,Y
LDA CDEL,X
LDX CX,Y
SEC
SBC SDEL,X
STA TEMP2 ;t2=cz-si
LDA CZREM,Y
SEC
SBC SDELREM,X
BCS :CONT3
DEC TEMP2
:CONT3 LDX CZ,Y
CLC
ADC CDELREM,X
BCC :CONT4
INC TEMP2
:CONT4 STA CZREM,Y
LDA TEMP2
STA CZ,Y
LDA TEMP
STA CX,Y
DEY
BPL :LOOP
Proj(P) = d*(P+C)/(px+cx)
where P=(px,py,pz) and C=(cx,cy,cz). In terms of the implementation, we
want to calculate:
d*((P-128) + s*(C-128)) / ((px-128) + s*(cx-128))
where s=20, to translate C into the 'real world'. To calculate this,
consider the following function:
g(x) = r*d / (s*((px-128)/s + cx-128)) + 128
where r is some scaling factor. The projection calculation then becomes:
1/r*( (g-128)*(P-128) + s*(g-128)*(C-128) )
Thus we need some more tables, one of 1/(4r) * (256-x)^2, the other of
s/(4r) * (256-x)^2, to do the multiplication. Furthermore a table of
(x-128)/s would be pretty handy, and finally we need a table of
g(x) = r*d/(s*(x-128)) + 128.
Get keypress
1- If translate, then update all cx's (just some INCs and DECs)
2- If rotate left or right, then rotate world
3- Update angles for global & local rotation matrix (e.g. theta)
4- Figure out which objects to display/construct a list
5- Call each object in turn
6- Update bitmap: clear out remaining garbage and swap bitmaps
Numbers 1 and 2 are done. In number three, by global matrix I mean the
object rotation that results from us turning left or right. By local
rotation I mean rotations independent of whether or not we turn. The
local rotation allows e.g. the octahedron to spin around in Polygonamy.
- Calculate the left and right lines.
- Take a trial step along the left and right lines
- If xleft < xright then we are OK, otherwise punt.
In principle we only need to do this on the first calculation, and use
some properties of the lines to make things easier (for instance, if the
left line is moving left and the right line is moving right, and they
emanate from the same point, we know the polygon is visible).
Unfortunately, nasty situations can arise, for instance when the left and
right slopes have the same integer parts. So a check needs to be placed
within the fill code to make sure the left point doesn't get ahead of the
right point. This is unfortunate, as every cycle counts in the fill
code, but luckily there is (was) a natural place to put in a quick check.
Analysis and Conclusions
Copyright © 1992 - 1997 Commodore Hacking
Commodore, CBM, its respective computer system names, and the CBM logo are either registered trademarks or trademarks of ESCOM GmbH or VISCorp in the United States and/or other countries. Neither ESCOM nor VISCorp endorse or are affiliated with Commodore Hacking.
Commodore Hacking is published by:
Brain Innovations, Inc.
Last Updated:
1997-03-31
by Jim Brain
Last Accessed: 1997-06-18 from buttons.ihug.co.nz using Mozilla/3.0
Statistics:
15 accesses today and 291 accesses since 1997-04-29