An Introduction to Raytracing

Meta- Stuff

This page is based off of a LiveJournal post that I made back in 2011. The post lacked a lot of things since it was a brain-dump of some math that I had formulated. I had written a raytracer before with some "fish-eye"-looking graphics. This post had the math that fixed the "fish-eye" distortion, but didn't adequately explain anything else. I'm rather proud of the HTML markup on this page, also. It turned out very presentable. Part of the idea of algorithms and math is that they should be beautiful and easy to understand. So, without further ado...

What is Raytracing?

In our physical world, light travels from the sun or some artificial light source, bounces off all the objects around us, and hits us in the face. More importantly, it enters our eyes and we see the effects of light and its interactions with the geometry surrounding us. Since we can mathematically represent this geometry and simulate rays of light on a computer, it stands to reason that we can generate some pretty realistic graphics by simulating the real-world interactions of light and shape.

Since only an astoundingly small percentage of the light reaches your eyes, when we build our simulation, it will trace light in the opposite direction. We trace the path of a photon from the viewer to a viewed object and then to the area's light source. The entire simulation requires only basic algebra and a minimal amount of trigonometry.

The Process Explained

Viewer → Object → Light... can't be that hard, right? Right! There's one last element we need, though, and that's a viewing surface. Without this, our graphics would be very distorted and many calculations would be harder than if a viewing surface did exist. Our viewing surface in the real world is bundled conveniently inside our eye in the form of a retina. In our simulated world, I imagine that you are carrying around a big window in front of your face, or, more accurately, a wire window screen.

Imagine holding this wire window screen a few feet from your face. Imagine each cell of the screen can only use one color. Light will pass from your eyes through the hole of the window screen to some object in the distance, and the array of colors defined by the screen creates an image. Say, for instance, you put a dab of paint in each cell of the wire screen that was the color of the area behind that cell. This would be hell to do in real life since moving around to paint the screen would make it difficult to find what's directly behind it and return to your precise previous viewing position... but this is just a thought experiment. Your paint-dabs would paint a picture of the scene behind the screen that you could carry away and, presumably, hang on your wall by your dinner table.

This wire window screen analogy works almost exactly. So, the strategy is to:

define an array or picture object to hold our data (the "screen" in the analogy),
trace a path of light through the screen to the object behind it,
hold the color of the object that we find,
trace the path from that point on the object to the nearest light source,
change the held color based on any obstructions on our previous path, and
put that color onto the screen object.

Busting into the Math

The first thing we need to do is define the screen. Here, I mean both the screen from our analogy and the screen of your computer. The cells of the wire screen from the analogy become pixels on your computer screen. For this, we're going to need some simple trigonometry. If you understand the basic sin, cos, and tan of trig., then you can safely skip the next few paragraphs.

Otherwise, find a piece of scrap paper and draw a horizontal line on it. At a right angle to the line, draw a vertical line. Connect the left end of the horizontal line with the top end of the vertical line. Somewhere in that set of lines, you'll see a right triangle. Go ahead and darken it. The left-most point we'll call A, the top-right point we'll call B, and the bottom-right point we'll call C. The angle ∠BAC we'll call Θ (theta, "thay-tah"– it's more-or-less Greek for the letter "Q").

A long time ago, the inventors of trigonometry noticed that there's a relationship to the lengths of these lines in a right-triangle and the angle Θ. If we take the length of line connecting A and B to be 1, then we can build what's called the unit circle. It's pretty intuitive that if we change the lengths of line AC and line BC, we can make Θ be anything between nothing at all and a perfect corner. If we drew our triangle in different orientations by flipping it horizontally, vertically, or both, then we could generate any angle in any quadrant using the same proportion of positive or negative line lengths. This is rather taxing on the imagination, though, so let's proceed.

These relationships between the lines and angles of a right triangle were given the names "sine", "cosine" and "tangent". The proportion (fraction) of line BC (the numerator) to line AB (the denominator) is called "the sine of Θ". The proportion of AC to AB is the cosine of Θ, and the proportion of BC to AC is the the tangent of Θ. Since BC is on the opposite side of the triangle as Θ, AB is the hypotenuse of the right triangle, and AC is adjacent to Θ, the mnemonic "SOH-CAH-TOA" arises for remembering the relationships: Sine is Opposite over Hypotenuse; Cosine is Adjacent over Hypotenuse; Tangent is Opposite over Adjacent.

The sine, cosine, and tangent of Θ are written sin( Θ ), cos( Θ ), and tan( Θ ), respectively.

These mappings work the other way, too. Assuming you have the angle, and you want to find the ratio of the opposite and adjacent sides, you could use the inverse-tangent function. It's written tan^-1 or called arctangent and written atan. Likewise, there are analogous functions for sine and cosine. There are two ways of measuring angles: degrees which range from 0 to 360 to represent angles from a line folded onto itself all the way to a full circle, and radians which range all the way from zero to 2*π or about 6.283. Here, we'll use radians.

For raytracing, we'll need sine and cosine, and to define some Θs, and we'll use aviation terms to do it. Turn your head from side to side. We'll call that our yaw. Nod your head up and down. We'll call that angle our pitch. Next, twist your head toward your left or right shoulders. We'll call that our roll. Using our basic knowledge of trigonometric functions, we can take a point and rotate it anywhere. This following fourteen lines are an "algorithm", or a set of steps, that accepts an input point (P) and three decimals representing a pitch, roll, and yaw (p, r, and y, respectively). Notice the capital P. For here, anything that's lower case is a simple number, and capital letters represent vectors– a simple set of numbers. Completing the algorithm "returns" a new version of point P following its rotation around the origin.

algorithm rotate ( Point P, angle p, angle r, angle y ) :
(1) Let d = √( P_x² + P_y² )
(2) Let Θ_roll = atan( P_y / P_x ) + π if P_x < 0 or atan( P_y / P_x ) otherwise.
(3) Let P_y = d × sin( Θ_roll + r )
(4) Let P_x = d × cos( Θ_roll + r )
(5) Let d = √( P_x² + P_z² )
(6) Let Θ_yaw = atan( P_z / P_x ) + π if P_x < 0 or atan( P_z / P_x ) otherwise.
(7) Let P_z = d × sin( Θ_yaw + y )
(8) Let P_x = d × cos( Θ_yaw + y )
(9) Let d = √( P_y² + P_z² )
(10) Let Θ_pitch = atan( P_z / P_y ) + π if P_y < 0 or atan( P_z / P_y ) otherwise.
(11) Let P_y = d × sin( Θ_pitch + p )
(12) Let P_z = d × cos( Θ_pitch + p )
return rotate( P, p, r, y ) = P

In the pseudocode, all the variable names are italicized and function names are put in boldface. The end result of the algorithm is the point P, an altered version of the same P that went into the algorithm at the top. But let's talk about how it works. Θ_yaw, Θ_pitch, and Θ_roll on lines 2, 6, and 10 are the most interesting part. Most programming languages have an atan2 function which take the numerator and denominator of the fraction to simplify things instead of using a conditional like we do. What it does is this: If you place a point on a piece of graph paper somewhere above and left of the origin, you can imagine a right triangle being made by drawing a vertical line straight down from the point to the x axis, and a horizontal line from the bottom end of that line extending right to the origin. If we define that triangle in the same pattern as our example earlier, it's easy to define an angle at the origin and see how the lines relate. We've defined an entire triangle using only one point and the origin! That's what this function does. It finds the "original" rotation of the point and sets the output to increase (notice the + on lines 3, 4, 7, 8, 11, and 12) it.

Lines 1, 5, and 9 are the simple distance formulas from basic algebra based on the theorem of Pythagoras. Since our imaginary triangle's hypotenuse is no longer 1, we'll use that formula to figure out it's true value. Then, lines 3, 4, 6, 8, 11, and 12 are multiplied by this value accordingly. The first and last lines are notational conveniences so we can say, rotate( (3,2,3), 1.5, 0.1, 0) and have it meaningful and well-defined.

You'll notice three sets of lines in the algorithm. Lines 1 through 4, 5 through 8, and 9 through 12 are very similar with each other. The first set, made obvious by the notation, rotate the point around x and z axes. If we treat y as the up-axis, you should notice changing the yaw of an object only changes the x and z values of the point that we're rotating. The second set takes the point and whirls it around the z axis, and the third set changes the pitch by rotating the point around the x axis.

Defining the Screen

We have the hard part behind us. Even if you're still reeling to figure out why atan was used, the rest should be pretty easy. First, draw a rectangle. That's your screen. We'll define the points across the top as T and U. They'll have values like T_x,y,z, etc. The bottom two points will be V and W. The angle from the left edge of the screen to your face to the right edge of the screen is called the "field of view". Humans have a wide "peripheral vision", but most computer simulations do not. Realistically, our clear vision doesn't extend all the way to the periphery. A good field of view for simulations is between eighty and one hundred ten degrees (1.396 to 1.200 radians). The next term we have to define is an "aspect ratio" (Θ_AR). Nowadays, the term is becoming quite common with the prevalence of wide-screen televisions. It is simply the width of your rectangle divided by its height– another proportion. The wider and more squished the rectangle appears, the larger the aspect ratio. A perfect square has an aspect ratio of 1, and wider rectangles have ratios between 0 and 1.

Using all this new terminology and our rotate algorithm that we defined above, let's find our values for T, U, V, and W. Let's also throw in something new: a view-point that's not at (0, 0, 0). We'll call it our camera C. It'll have x, y, and z values, as well as rotations of its own, which we'll call C_ΘP, C_ΘR, C_ΘY for its pitch, roll, and yaw, respectively. Another thing we'll need is a single point that defines "what we're looking at." We'll call it our subject. The four points of the screen will be defined around this central, focal point F which should be directly right of the camera. We define the focal point using the camera's roll, pitch, and yaw. Lastly, consider each line of the formulas like three. T is shorthand for T_x, T_y, and T_z. T_x would correspond with C_x and F_x, and so forth.

(1) Let T = C + rotate( F - C, C_ΘP + ( Θ_FOV / 2 ), C_ΘR, C_ΘY - ( (Θ_FOV × Θ_AR) / 2) )
(2) Let U = C + rotate( F - C, C_ΘP + ( Θ_FOV / 2 ), C_ΘR, C_ΘY + ( (Θ_FOV × Θ_AR) / 2) )
(3) Let V = C + rotate( F - C, C_ΘP - ( Θ_FOV / 2 ), C_ΘR, C_ΘY - ( (Θ_FOV × Θ_AR) / 2) )
(4) Let W = C + rotate( F - C, C_ΘP - ( Θ_FOV / 2 ), C_ΘR, C_ΘY + ( (Θ_FOV × Θ_AR) / 2) )

Notice that rotate rotates a point around the origin. In order to accomplish this with our focal point, find the focal point's relation to the camera with subtraction, rotate that point, and then put it back with the initial addition. It's tricky, but it makes sense.

Shooting Rays

The next part is to build a test point, move the point along a straight line, and see if it collides with anything. The simplest object to detect collisions on would be a sphere:

algorithm sphere_collide ( Point P, Sphere S ):
(1) If √( (P_x - S_x) ² + (P_y - S_y)² + (P_z - S_z)² ) ≤ S_radius, then return true
(2) otherwise return false

Notice that line 1 is essentially just the distance formula. If the distance from our test point is less than the radius of the circle, then it's true that the point collides (is inside or on the surface) the sphere. In order for this to be important, though, we need to actually shoot the ray and see if this collides. So, we'll have an algorithm that "shoots" the ray from camera C to some destination point D. Once again, each line in the algorithm is really three: one for the x, y, and z parts of each point. We'll also define our screen symbolically as S, for "screen". The last thing we'll need is a "ray resolution", r, which is just a constant number that says how many steps we want our ray to take from its source (the camera C) to its destination (the point P). The algorithm then looks like this:

algorithm shoot_ray( Point C, Point D, World Γ ):
(1) For i = 0 → r:
(2) Let A = C + (D - C) × (-i / r )
(3) For j = 0 → size_of( Γ )
(4) if sphere_collide( A, Γ_j ), then return A
(5) return 0

We're getting a little fancy, here, but there's nothing to fear. The Γ is a "gamma", a list of sphere objects that make up our world. I implicitly defined size_of, but it seems pretty intuitive: it gets the number of items in Γ. Also the For construct might seem a bit new. On line 1, I define i to equal zero, then do lines 2, 3, and 4 before returning to 1. When I do go back to line 1, I increment i unless it's greater than r. If it is, I skip down to where the indentation jumps back left: line 5. Likewise, the For on line 3 means do lines 3 and 4 repeatedly until either we exit the algorithm completely, or j is greater than the number of elements in Γ.

So, if A collides with some sphere, we'll get the point of collision out of the algorithm, otherwise we'll get a zero. Let's look more closely at line 2. Points C and D are constant, and so is the ray resolution r. The only thing changing is i. When i is zero, ( -i / r ) is zero, too. We'd be multiplying ( D - C ) by zero, and the end result of the equation would be C. When i and r are equal, their quotient, that fraction, becomes one. Now, we're multiplying ( D - C ) by one and subtracting it from C. C + D - C, then, equals D. What we've done is constructed a gradient of sorts between C and D and tested the points in between.

So, now we must return to our screen S and the four points that defined it T, U, V, and W. We'll use the same kind of zero-to-one magic that we used in the ray-shooting algorithm:

algorithm make_picture( Point C, Point T, Point U, Point V, Point W ):
(1) For j = 0 → S_height
(2) Let P_j1 = T + ( V - T ) × ( -j / S_height )
(3) Let P_j2 = U + ( W - U ) × ( -j / S_height )
(4) For i = 0 → S_width
(5) Let P_ij = P_j1 + (P_j2 - P_j1) × ( -i / S_width )
(6) Let Q = shoot_ray( C, P_ij, Γ )
(7) If Q ≠ 0 then S_i,j = color_of( Γ_i,j ) and jump to 1

Let's look at what's happening here. Just like the shoot_ray algorithm makes a little gradient pathway from one point to another, here we have that same pattern going on twice on lines 2 and 3 where we're connecting the top-left (T) with the bottom-left (V) and the top-right (U) with the bottom-right (W). Lastly, we're doing that same trick one more time connecting these rails that run down the side of the picture with one that runs across: line 5. After we build this test point, we shoot a ray to it. If the ray strikes something, we change the color of S to the color of whatever we hit.

On line 7 we see the words "jump to 1". After our ray collides with an object in the world Γ, we stop. Light doesn't travel through objects in our simulation, so continuing the ray would be a waste of time. Instead, we'll increment our value of j and return on our way.

Back into the World

We already have the means of shooting a ray from a camera to a destination point. We have a screen object that captures colors. In short, we have all the things we need to finish basic ray-tracing. Shadows and reflections are, as we'll see a little later, mere applications of what we've already learned. There are more shapes in this world than just spheres.

An unrotated box is probably the next most obvious thing to determine a point of collision against: If the test point's (P's) coordinates are greater than the smallest-numbered coordinates of the box but smaller than the largest-numbered coordinates of the box, then we have a collision. To elaborate, imagine a box. The bottom-left-front point of the box we'll call (-1, -1, -1) and the top-right-back corner we'll call (1, 1, 1), for simplicity. Now, if we're testing a point to determine it's collision with this box, then we'll test to see that P_x, P_y, and P_z are greater than or equal to -1 and less than or equal to 1. If this test passes: we have a collision.

A tube would be another easy target. The shape is a line segment with bulbous caps on each end. We could use our gradient approach that we used for tracing a line and see if the test point is closer to the line than the radius of a the tube. We'd be mixing the sphere code with the line code. PovRay calls this shape a "sphere sweep" because our test would be essentially stretching out a number of spheres. Let's look at the formal algorithm for this one. We'll define a tube T by its two end points T_A and T_B and its radius T_r. Note that the A and B are capitalized. This means that they are vectors. They've got x, y, and z coordinates, after all.

algorithm tube_collide( Tube T, Point P ):
(1) For i = 0 → r:
(2) Let A = T_A + ( T_B - T_A ) × ( -i / r )
(3) If √( (A_x - P_x)² + (A_y - P_y)² + (A_z - P_z)² ) ≤ T_r then return A
(4) return 0.

There are others shapes to collide bombard with points, but let's take another step toward having a complete idea of our world's geometry. Often times it's convenient to "subtract" one shape from another. We can cut a box out of another box, or a tube through a sphere. This is simple logic that most raytracer's accomplish pretty easily. The concept is called "constructive solid geometry". One really doesn't need a formal algorithm to explain the concept, and in fact, it would probably make more of a mess trying to formalize it. So, imagine that box again. Now, imagine another box jutting through it, carving into it and out the opposite end. We've got one box impaling another. Let's call the main box A and the intersecting box B. If a ray would strike the box B by itself, we don't care. We're not drawing the box. Therefore, we'll leave it out of our world Γ. Instead, when we program a data structure for our box A, we'll sort of include as a note that we've got a shape being subtracted. Then, when a line strikes A, we'll check if the line also strikes B. If it does, we'll ignore the collision. It's as easy as that! We call this subtraction a "boolean difference".

There is another element of constructive solid geometry that's equally useful. This is a "boolean intersection". When a ray collides with our AB object, we'll ignore it unless it strikes the space occupied by both of them. That means if the ray it's A but the space isn't occupied also by B, we'll ignore it, and also ignore the collision if the ray strikes B but not A. What we're doing, then, is binding one object by the other. In our previous example with the two boxes, the end result would be the intersecting long box being bounded by the original box. So, the result is still a box. If we intersected a sphere with a box, we'd get a more interesting shape. If we intersected a box with our sphere sweep, we could get a proper cylinder.

So, in our shoot_ray algorithm, on line 4, we'd have to change the sphere_collide to map to the correct function based on what Γ_j's type is. In fact, this is where object-oriented programming becomes a plus. A "shape" interface could contain the structure for a shape. Every shape has a material, or at least a color and a function determining if a test point intersects the shape. The raytracer throws a bunch of different test points at all of the shapes contained within the world of the raytracer, and the shapes that use the interface would each define their own special way of determining collisions.

Ray Tracing

So, we have a world filled with intersections and differences of spheres, boxes, and tubes. We've got a pretty good thing going for us. In theory, we could already put together a simple architectural model. There are two important features missing, though: shadows and reflections. Both of these things raytracers excel at producing.

To continue, we need to understand color. So far, we've used it fairly abstractly, assigning S_ij a color without really defining what a color is. In most programming languages, colors are defined using red, green, and blue components. We could define a color in the same way we've been defining points: 3-tuples. If the word "tuple" seems foreign to you, don't be afraid. A 3-tuple is just an extension of a tuple, which we know in informal-talk as an ordered pair. (-1, -1, -1) is an example of a 3-tuple.

For colors, though, most systems use values from 0 to 255. Some, like PovRay use values from 0 to 1. Anyway, mixing colors for our raytracer is pretty intuitive. Having full values of the red, green, or blue components produce that color. (255, 0, 0) is red. (0, 255, 0) is green. We can mix them using (255, 255, 0) to make yellow. Besides full-and-empty values for each component, we can mix it up to produce dimmer colors. (128, 128, 0) is a dim yellow. (128, 128, 128) is what's called "50% gray". Green and blue can be mixed to make cyan, and red and blue can be mixed together to produce magenta.

The next thing we'll need is the concept of "ambient light". When we have a shadow, it's very rarely fully black. This is because light reflects and lightens the area creating the shadow, or, beyond the scope of this writing, there are multiple light sources in the area. Let's simplify our simulation, as many popular ray-tracers do by having a color defined to represent the ambient light in a room. A good candidate would be dark gray. Now, when we shoot a ray to our object, we'll then trace it to our light source L. Our updated algorithm looks like this: