Rendering Stories: November 2012

Tuesday, November 27, 2012

Ray tracing acceleration

In my previous post i implemented the simplest whitted ray tracer with the one of simplest primitives - speheres. In real life production rendering the renderer has to deal with a typical scene containing million of triangles and rendering them without any acceleration technique from within the algorithm itself will take many hours and even worse days. So there is no excuse of not incorporating some kind of acceleration data structure with the little renderer i have had. In this post i shall give a brief over-view of ray triangle intersection technique and largely on one of the most widely used acceleration data-structure in ray tracing.

Triangle is the most basic primitive of rendering in graphics hardware. So it is natural to include ray-triangle intersection test even for the basic ray tracing application as well. All of the meshes used in this project made of triangular primitive.

In my previous post, all of the primitives were spheres. Once i have the ray triangle intersection implemented as suggested inside the book "Real-time Rendering" by Tomas Akenine Möller, i have it updated in the following rendered image:

Spherical objects with a plane
made of triangles

One of the greatest myths concerning computers today is that we shall keep on having computers with increasing processing power. With the advancement of GPU processing power we are also striving for real-time performance from photo-realistic ray tracer. In real-time rendering, we have at least four performance goals: more frames per second, higher resolution and sampling rates, more realistic materials and lighting, and increased complexity. A speed of 60-85 frames per second is generally considered a fast enough frame rate.

As previous post have shown, describing the and evaluating an object's material can be computationally complex. Modelling the interplay of light and surface can absorb an arbitrary high amount of computing power. And it is true because am image should ultimately be formed by the contributions of light travelling down a limitless number of paths from illumination source from the eye.

In this post, Bounding Volume Hierarchy will be discussed rigorously as one of the most widely used acceleration algorithms in computer graphics rendering. Bounding Volume Hierarchies(BVH) are an approach for ray intersection acceleration based on primitive subdivision, where the primitives are partitioned into hierarchy of disjoint sets. The following figure shows a bounding volume hierarchy for a simple scene.

Bounding Volume Hierarchy
for a simple scene
Image Courtesy of
PBRT

Primitives are stored in the leaves and each nodes stores the bounding box of the primitives in the nodes under it. Thus, as a ray traverses through the tree, any time it does not intersect the node's bound, the subtree beneath that node can be skipped. One property of primitive subdivision is that each primitive appears in the hierarchy only once. In contrast, a primitive may overlap many grid voxels and thus may tested for intersection multiple times as they passes through them. Another implication of this property is that, the amount of memory needed to represent the hierarchy is bounded.

BVHs are generally almost as efficient to build as grids are, while delivering much better performance to being able to better adapt to irregular distributions of primitives in the scene. The kd-trees generally deliver slightly faster ray intersection calculation than BVHs but take substantially longer to build. BVHs are generally more numerically robust and less prone to subtle round-off bugs than kd-trees are.

All the rendering i have done so far beginning from my previous post does not contain any acceleration data-structure and it did not affect that much in the rendering time since , all the primitives that are used are the simplest ones. Very soon we shall appreciate the use of the acceleration data-structure one we shall be rendering more complex shapes faster. Let review the following rendering architecture i had been using.:

Renderer with List Accelerator
Image Courtesy
of
Lund Graphics Group

As you can see i really do not have any accelerator class implementing any acceleration algorithms. The so called ListAccelerator is nothing but a class containing the primitive list and each of them has to be tested for intersection explicitly by the primary. Now you see that it is worth to be mentioned as "Yuck".

My goal is to implement the BVH acceleration and replace the list accelerator so the the framework looks like the following:

Renderer with BVH ccelerator
Image Courtesy
of
Lund Graphics Group

The BVH accelerator accepts two parameters in the constructors. The first is the maximum of primitives in each leaf node and the second takes a string that describes which of the three algorithms to use when partitioning primitives when building the tree. The default, "sah", indicates that an algorithm based on "surface area heuristics" should be used. The other two approaches take slightly less computation when building the tree but are typically less efficient when used for ray intersections.

There are several stages to BVH construction. They are as follows:

Bounding information about each primitive is computed and stored in the STL container (vector/list) that will be used during tree construction.
The tree is built via a procedure that splits the primitives into subsets and recursively builds BVHs for the subsets. The result is a binary tree where each interior node holds pointers to its children and each leaf node holds references to one or more primitives. Finally, the tree is compacted and pointer-less implementation for use during rendering. I built the linearized tree from the very beginning during the construction process.

For each primitive to be stored in the BVH, i store the centroid of its bounding box. As the tree is built, the build_recursive() will be recursively called , primitive list will be sorted and partitioned to place them into groups that are spatially close to each other. The following pseudo code gives an over-view of the bvh construction:

The build function
that initiates the BVH construction
Image Courtesy of
Lund Graphics Group

The recursive build function
Image Courtesy of
Lund Graphics Group

The build_recursive(...) takes as parameters the range [left_index,right_index) . It is responsible for returning the number of primitives within the range and add linearly in the heap structure so that the pre-order traversal make sure that we do not have any wasted memory as follows:

Binary BVH stored as the pointer-less array representation. Children of node at array position i can be found at positions at 2i+1 and 2i+2
Note the wasted memory shown in grey
Image Courtesy og
Real-time Collision Detection

When the tree is perfectly balanced, the linear representation saves memory space and pointer reads during the tree traversal. Unfortunately, for a non-balanced tree, space still has to be allocated as if the tree complemented with the extra nodes to make it fully balanced. What is worse, even a single extra node is added to fully balanced tree will add one full extra level to the tree, doubling its storage space.

As such, this representation is most useful when actual node data is small compared to the combined size of the child pointers, or when the tree is really fully balanced. For instance, when a hierarchy has been built using a median based splitting criterion that guarantees some level of near balance, this could be a useful representation.

Even when there is no guarantee about the balance of the tree hierarchy, it is possible to output the data in more effective representation. If the tree nodes are output in the pre-order traversal order, the left child when present will always immediately will follow its parent. In this manner, although a link is still needed to point at the right child only a single bit is needed to indicate whether there is a left child. The following figure illustrates the binary tree and its node output in the pre-order traversal:

Same tree as the last one, but the nodes are in preorder traversal.
Now the nodes a pointer to the right child .
They also need a bit to indicate if the node has a left child.
Here, this bit is indicated by a gray triangle

I adopted the immediate above technique for linearizing the BVH tree. A typical tree implementation uses (32-bit) pointers to represent the node child links. However, for most trees a pointer representation is overkill. It is a better option to allocate tree nodes from within an array with a 16-bit index value from the start of the array. And this work for both the static and dynamic trees.

For interior nodes, the collection of primitives must be partitioned into two children subtrees. In practice when building the BVHs, one generally considers partitions along a coordinate axis, meaning that there are 6n candidate partitions, where n is the number of primitives. I chose one of the coordinate axes to use in partitioning the primitives. I select the axis with the greatest variation of the bounding box centroids for the current set of primitives. It gives good partitions in many reasonable scenes. The following figure illustrates the strategy:

Choosing the axis along the partition primitive
Image Courtesy of
PBRT

The general goal of the partition is to select a partition of primitives that does not have too much overlap of the bounding boxes of the two resulting primitive sets - if there is a substantial overlap then it will be more frequently be necessary to traverse both children subtrees when traversing the tree which in turn requires more computation than if it had been possible to more effectively prune away the collection of primitives.

One of the simplest partition techniques first compute the midpoint of the primitives' centroids along the splitting axis. The primitives are classified into the two sets, depending on whether their centroids are above or below the midpoint. This partition is easily done with the C++ standard library function std::partition(), which takes a range of elements in an array and a comparison function and sorts the elements in the array so that all of the elements that return true for the given predicate function appear in the range before those that return false for it. It returns a pointer to the first element that had a false value for the predicate, which is converted to the offset and sent to the recursive function. The following figure illustrates the method i used:

Midpoint Splitting Technique[2]

As shown in the pseudocode, we need to build the BVH from the list of intersectables. Starting with the entire set of intersectables we must intelligently, split that into two subsets as shown in the last figure. This simple process is repeated recursively until there is only few primitives per set. And each subset has its own bounding box, which is, at most, as big as the parent set. In the end, we shall have a binary tree with an approximate O(log n). I started with the diagnostic setup to check the implementation of the BVH. With mid-point splitting technique and 10 rays per pixel, i got the following 1024 X 1024 output withtin 16.81 seconds.

Diagnostic setup contains 80 non-scrambled spheres

I also rendered the scrambled spheres and it took 34.29 seconds with the very same number of samples. Here goes the following image:

Diagnostic setup contains 80 scrambled spheres

Another straightforward partitioning scheme is to partition the primitives into two equal-sized subsets such that the first half of the n of them are the n/2 with the smallest centroids coordinate values along the chosen axis and the second half are the ones with the largest centroid coordinate values. While this approach can sometimes work well, there are instances where this method also fares poorly. I rendered the scrambled and non-scrambled spheres with this partitioning scheme; it took 34.52 seconds for the scrambled version and non-scrambled version took 16.56 seconds.

The two partitioning approaches i have described so far can work well for some distributions of primitives, but they often choose partitions that perform poorly in practice, resulting in more nodes of the tree being visited by rays and hence unnecessarily inefficient ray-primitive intersection computations at rendering time. Most of the current algorithms for building acceleration structures for ray tracing are based on surface area heuristic, which provide a well-grounded cost model for answering questions like "which of a number of partitions of primitives will lead to a better BVH for ray - primitive intersection test" [2]. The SAH model estimates the computational test of performing the ray intersection tests, including the time spent for traversing the nodes of the tree and time spent on the ray.

I rendered the following 512 X 512 image with 3X3 samples per pixel using all of the above discussed splitting criteria.

Mid-point split - 80.37 seconds.
SAH split - 71.69 seconds.
Equal split - 101.67 seconds.

We can see that the SAH partition is clearly the winner here.

BVH with SAH splitting
1024X1024 image
100 samples
2040.39 seconds

References

[1] Lund University Graphics Group

[2] Physically Based Rendering, Matt Pharr, Greg Humphreys

[3] Realistic Ray Tracing, Peter Shirley.

[4] Real-time Collision Detection.

Monday, November 26, 2012

Whitted Ray Tracing

My rendering experience at computer took off with the implementation of the whitted ray tracer. I pulled out the skeleton frame-work from the Lund Graphics Group. The intention is to write maintainable code structures that generate photo-realistic images. The skeleton came along with all the basic and convenient utility classes like - Vector, Matrix, Point, Color, Image, etc. It follows the right handed coordinate system and it also gives the option to load/save image in both .png and .exr format.
The initial incarnation of the of the ray tracer is only capable of shooting eye rays and detecting if they hit any spheres in the scene or not.

I would like to give a small over view of some of the utility classes that came along with the skeleton.

Color - It represents the RGB color value. Each color component (r,g,b) is stored as a float and it is not limited to any specific range since we shall be working with the high-dynamic range images. The most basic arithmetic operations have been implemented, so that two colors can be added, multiplied, etc simply by: c1 - c2, c1 * c2, etc. All the arithmetic operations work element-wise.
Vector - It represents a vector direction in 3D space. Intuitive math operations are implemented.
Point - It represent a position in 3D space.
Ray - The basic utility class in any ray tracing operation.It contains some of the above mentioned class objects as members. The origin of the ray is represented by Point and direction of the ray is represented by Vector.

Ray structure
Image courtesy by:
Lund Graphics Group

Scene - All the objects in the 3D world is represented by this class. These objects include the following:

Scene Representation
Image courtesy by:
Lund Graphics Group

Primitives.
Light sources.
Cameras.

All of the above mentioned objects are added to the scene with the add() function and internally stored in a scene graph. Before any operation can be performed on the scene, the prepare() function should be called. This function prepares the scene hierarchy for rendering by computing all the transformation matrices, setting up the acceleration data structure, and caching necessary data. A ray can be intersected tested against the scene by calling the intersect() function.This intersection() function will, in turn call the intersection() function of the accelerator class which will soon be discussed. There are two over-loaded intersection function, one of which just return the boolean flag of either true/false and the other store the intersection information along with the boolean flags. The intersection information are used for shading, reflection, refraction. The latter version is slower than the former one.

Intersection interface
Image courtesy by:
Lund Graphics Group

Intersection - It represents the ray/object intersection point. All the necessary information about the intersection is computed by the intersected object and stored in this class.

The main function do the following typical setup for the ray tracer[2]:

Build a scene.

Create materials, objects, lights.

Create the image buffer represented by the Image class. It contains the size (width X height) of the image. Each pixel of the image is a Color object with red, green, and blue components. There are functions to load and save the image in OpenEXR (.exr) and for getting and setting pixel values. This class is used as frame buffer in the ray tracer, storing the output color of each pixel, and also used in the LightProbe class for representing the environment map.
Setup the perspective camera. The class Camera needs to know which Image object is used for the output image in order the extract the image dimensions. The camera can be set conveniently setup using the setLookAt() function, which takes a position, target and an up vector. The field of view parameter measures the FOV in degrees horizontally.
Prepare the scene as mentioned above.
Create the ray tracer object.
Perform the ray tracing.
Save the output image.

The following diagram will provide the reader a broader overview of this tiny renderer:

Skeleton Overview
Image courtesy by:
Lund Graphics Group

Whitted ray tracer launches up to three new rays at the intersection point between the ray and the intersection object; one for determining the shadow, one for perfect reflection, and one for perfect refraction. The shadow ray is always launched to determine if there is any objects located between the intersection point and the light source. Reflection and refraction rays are spawned based on the material properties of the intersected object.

The basic ray tracer embryo contains some spherical objects in the scene. It colors the pixel white if the ray hits anything, black otherwise.

Default output image from the
whitted ray tracer

As you can see from the above image that there is no light transport in the scene. It would be definitely more interesting if we would be able to apply the diffuse shading to the spheres. For now i only consider the direct illumination that originates from the point light source. The implementation will involve the calculation of the basic rendering equation as follows:

Rendering Equation

We can break the basic rendering equation as follows:

Breakdown of the rendering equation

The scene contain a number of point light sources. We do the following calculation to get the contribution from the light source.

The light vector is calculated from the intersection point to the point light and then normalize it and this is the incident light vector.
The incident angle - the dot product between the light vector and the normal at the intersection point is calculated.
The BRDF(Bidirectional Reflectance Distribution Function) is calculated at the intersection point for the light vector which have been calculated.

I got the following output of the diffuse reflection:

Diffuse Reflection

So far shadows are not considered. If there is an object between the hit point and the light source, there is no direct illumination, and thus the direct lighting factor will be zero. By sending a shadow ray from the hit point to the light source and checking for intersections, i can determine if the hit point is in shadow or not. The following image illustrate the scenario:

The point p is not in shadow while the point q is not in shadow
Image Courtesy of Peter Shirley in
Fundamentals of Computer Graphics

Note that the intersection test only has to return a boolean, true or false answer, since we are not interested in any intersection information. To get the algorithm for shading, i add an if statement to determine if the point is in shadow or not. In the naive implementation, the shadow ray will check if the instance of intersection is between 0 and INFINITY. But, because of the numerical imprecision, this can result in an intersection with the surface the point in question (p) lies. Instead, the usual adjustment to address the problem is to add a small positive constant so that we can check the intersection between the positive constant and INFINITY. The following figure elaborates better , i believe:

By testing the interval starting at that positive constant, the numerical imprecision is
avoided causing the ray to hit the surface point under consideration
Image Courtesy of Peter Shirley in
Fundamentals of Computer Graphics

I generated the following image with shadow calculation:

Diffuse Reflection with Shadow ray

So far i have generated the diffuse reflection with shadow ray. Now the goal is to get more realistic looking image by adding reflection. Real world materials are of course neither perfectly diffuse nor perfectly specular, the combination of the two components can give fairly convincing polished materials.

Similar to the shadow rays in last addition, the reflectance can be spawned at the point of intersection. In this way, the ray originating at the eye can be traced recursively to account for alterations in the ray path caused by reflections. Since we have reflection, the basic rendering equation is updated as follows:

The above equation contains a new parameter - specular reflectance , represented by r. For the time being it is enough to assume that specular reflectance affects all wavelengths equally ,and thus is a single coefficient between 0 and 1. If r = 0.0 it means that the surface is not reflective, and if r = 1.0, then the surface is a perfect mirror. Note that the resulting color is linearly interpolated using the specular reflectance. After adding this specular reflectance i get the following outpt:

Reflection with 5 recursive depth

Another important feature of the ray tracer is the ability to handle transparency and refractions. Many real materials are more or less transparent( glass, plastics, liquids, etc.). When lights enters a transparent material, it is usually bent or refracted. How much is determined by the index of refraction of the material. By the Snell's law we can compute the refraction vector T. Similar to the reflection term, i can add the refraction term to the transmittance model as follows:

Just like reflection, a refraction can be treated by recursively spawning a new ray at the hit point of a refractive surface where t > 0. Like before, i interpolate between the direct illumination, reflection and refraction components, so it should hold that r+t <= 1. Here goes the rendered image with reflection , refraction and shadow:

Reflection & Refraction with shadows

All of the images produced so far appear very jagged when examined at close up. This is because i am only tracing a single ray through each pixel. To get a smoother result, it is necessary to use some kind of super-sampling.

To actually compute an estimate of the average color of the pixel, an algorithm is needed for picking "uniform" points on the pixel. There are several methods of obtaining samples within a pixel. Usually, such algorithms pick samples within the unit square. Two simple possibilities of generating sample over a unit square is show below:

Sampling strategies
Image courtesy of
Realistic Ray tracing
by
Peter Shirley

One of the problems of random sampling is that the given set of samples may be bunched together within the unit square if we are unlucky. It will give an inaccurate estimate of the pixel's color average color because we are using many samples within a small area of the pixel to estimate the whole pixel. It would be rather better to have all the samples more uniformly distributed over the unit square. One method to avoid clumping is to choose our samples as the vertices of a uniform grid within the unit square. There is an additional benefit to this, since the computation is reduced as a result of the fact that the sample pattern only needs to be generated once for the entire image. This is because the same pattern is used for all pixels. Unfortunately, regular sampling suffers from a new flaw. Aliasing can occur as a result of the correlation of samples between pixels. There are many sampling techniques that attempt to have a good aspects of regular and random sampling. The following figure shows some of them[1].

Different Sampling Techniques
Image courtesy of
Realistic Ray tracing
By
Peter Shirley

I have used the multi-jittered sampling technique . Here goes the following image :

Multi-jittered sampling
with 100 primary rays per pixel

If you look at the image closely and compare it with the previous ones you will see the difference. At the same time i would like to stress that sampling for anti-aliasing is the weakest way of testing its strength. To really stress test them and expose their flaws, you need to use them for depth-of-field simulation, ambient occlusion, shading with area lights, particularly environment lights. Some of them will show up in my next post.

All of the images generated so far are way too perfect and far from being photo-realistic. The goal of photo-realism is addressed mainly by monte carlo path tracing techniques and their variations. I shall discuss the simplest one in my next post.

References

[1] Realistic Ray Tracing, Peter Shirley and R. Keith Morley
[2] Lund University Graphics Group
[3] Ray Tracing from Ground Up, Kevin Suffern
[4] Fundamentals of Computer Graphics, Peter Shirley