Triangle is the most basic primitive of rendering in graphics hardware. So it is natural to include ray-triangle intersection test even for the basic ray tracing application as well. All of the meshes used in this project made of triangular primitive.
In my previous post, all of the primitives were spheres. Once i have the ray triangle intersection implemented as suggested inside the book "Real-time Rendering" by Tomas Akenine Möller, i have it updated in the following rendered image:
![]() |
Spherical objects with a plane made of triangles |
One of the greatest myths concerning computers today is that we shall keep on having computers with increasing processing power. With the advancement of GPU processing power we are also striving for real-time performance from photo-realistic ray tracer. In real-time rendering, we have at least four performance goals: more frames per second, higher resolution and sampling rates, more realistic materials and lighting, and increased complexity. A speed of 60-85 frames per second is generally considered a fast enough frame rate.
As previous post have shown, describing the and evaluating an object's material can be computationally complex. Modelling the interplay of light and surface can absorb an arbitrary high amount of computing power. And it is true because am image should ultimately be formed by the contributions of light travelling down a limitless number of paths from illumination source from the eye.
In this post, Bounding Volume Hierarchy will be discussed rigorously as one of the most widely used acceleration algorithms in computer graphics rendering. Bounding Volume Hierarchies(BVH) are an approach for ray intersection acceleration based on primitive subdivision, where the primitives are partitioned into hierarchy of disjoint sets. The following figure shows a bounding volume hierarchy for a simple scene.
![]() |
Bounding Volume Hierarchy for a simple scene Image Courtesy of PBRT |
Primitives are stored in the leaves and each nodes stores the bounding box of the primitives in the nodes under it. Thus, as a ray traverses through the tree, any time it does not intersect the node's bound, the subtree beneath that node can be skipped. One property of primitive subdivision is that each primitive appears in the hierarchy only once. In contrast, a primitive may overlap many grid voxels and thus may tested for intersection multiple times as they passes through them. Another implication of this property is that, the amount of memory needed to represent the hierarchy is bounded.
BVHs are generally almost as efficient to build as grids are, while delivering much better performance to being able to better adapt to irregular distributions of primitives in the scene. The kd-trees generally deliver slightly faster ray intersection calculation than BVHs but take substantially longer to build. BVHs are generally more numerically robust and less prone to subtle round-off bugs than kd-trees are.
All the rendering i have done so far beginning from my previous post does not contain any acceleration data-structure and it did not affect that much in the rendering time since , all the primitives that are used are the simplest ones. Very soon we shall appreciate the use of the acceleration data-structure one we shall be rendering more complex shapes faster. Let review the following rendering architecture i had been using.:
![]() |
Renderer with List Accelerator Image Courtesy of Lund Graphics Group |
As you can see i really do not have any accelerator class implementing any acceleration algorithms. The so called ListAccelerator is nothing but a class containing the primitive list and each of them has to be tested for intersection explicitly by the primary. Now you see that it is worth to be mentioned as "Yuck".
My goal is to implement the BVH acceleration and replace the list accelerator so the the framework looks like the following:
![]() |
Renderer with BVH ccelerator Image Courtesy of Lund Graphics Group |
The BVH accelerator accepts two parameters in the constructors. The first is the maximum of primitives in each leaf node and the second takes a string that describes which of the three algorithms to use when partitioning primitives when building the tree. The default, "sah", indicates that an algorithm based on "surface area heuristics" should be used. The other two approaches take slightly less computation when building the tree but are typically less efficient when used for ray intersections.
There are several stages to BVH construction. They are as follows:
- Bounding information about each primitive is computed and stored in the STL container (vector/list) that will be used during tree construction.
- The tree is built via a procedure that splits the primitives into subsets and recursively builds BVHs for the subsets. The result is a binary tree where each interior node holds pointers to its children and each leaf node holds references to one or more primitives. Finally, the tree is compacted and pointer-less implementation for use during rendering. I built the linearized tree from the very beginning during the construction process.
For each primitive to be stored in the BVH, i store the centroid of its bounding box. As the tree is built, the build_recursive() will be recursively called , primitive list will be sorted and partitioned to place them into groups that are spatially close to each other. The following pseudo code gives an over-view of the bvh construction:
![]() |
The build function that initiates the BVH construction Image Courtesy of Lund Graphics Group |
![]() |
The recursive build function Image Courtesy of Lund Graphics Group |
The build_recursive(...) takes as parameters the range [left_index,right_index) . It is responsible for returning the number of primitives within the range and add linearly in the heap structure so that the pre-order traversal make sure that we do not have any wasted memory as follows:
When the tree is perfectly balanced, the linear representation saves memory space and pointer reads during the tree traversal. Unfortunately, for a non-balanced tree, space still has to be allocated as if the tree complemented with the extra nodes to make it fully balanced. What is worse, even a single extra node is added to fully balanced tree will add one full extra level to the tree, doubling its storage space.
As such, this representation is most useful when actual node data is small compared to the combined size of the child pointers, or when the tree is really fully balanced. For instance, when a hierarchy has been built using a median based splitting criterion that guarantees some level of near balance, this could be a useful representation.
Even when there is no guarantee about the balance of the tree hierarchy, it is possible to output the data in more effective representation. If the tree nodes are output in the pre-order traversal order, the left child when present will always immediately will follow its parent. In this manner, although a link is still needed to point at the right child only a single bit is needed to indicate whether there is a left child. The following figure illustrates the binary tree and its node output in the pre-order traversal:
I adopted the immediate above technique for linearizing the BVH tree. A typical tree implementation uses (32-bit) pointers to represent the node child links. However, for most trees a pointer representation is overkill. It is a better option to allocate tree nodes from within an array with a 16-bit index value from the start of the array. And this work for both the static and dynamic trees.
For interior nodes, the collection of primitives must be partitioned into two children subtrees. In practice when building the BVHs, one generally considers partitions along a coordinate axis, meaning that there are 6n candidate partitions, where n is the number of primitives. I chose one of the coordinate axes to use in partitioning the primitives. I select the axis with the greatest variation of the bounding box centroids for the current set of primitives. It gives good partitions in many reasonable scenes. The following figure illustrates the strategy:
![]() |
Choosing the axis along the partition primitive Image Courtesy of PBRT |
The general goal of the partition is to select a partition of primitives that does not have too much overlap of the bounding boxes of the two resulting primitive sets - if there is a substantial overlap then it will be more frequently be necessary to traverse both children subtrees when traversing the tree which in turn requires more computation than if it had been possible to more effectively prune away the collection of primitives.
One of the simplest partition techniques first compute the midpoint of the primitives' centroids along the splitting axis. The primitives are classified into the two sets, depending on whether their centroids are above or below the midpoint. This partition is easily done with the C++ standard library function std::partition(), which takes a range of elements in an array and a comparison function and sorts the elements in the array so that all of the elements that return true for the given predicate function appear in the range before those that return false for it. It returns a pointer to the first element that had a false value for the predicate, which is converted to the offset and sent to the recursive function. The following figure illustrates the method i used:
![]() |
Midpoint Splitting Technique[2] |
As shown in the pseudocode, we need to build the BVH from the list of intersectables. Starting with the entire set of intersectables we must intelligently, split that into two subsets as shown in the last figure. This simple process is repeated recursively until there is only few primitives per set. And each subset has its own bounding box, which is, at most, as big as the parent set. In the end, we shall have a binary tree with an approximate O(log n). I started with the diagnostic setup to check the implementation of the BVH. With mid-point splitting technique and 10 rays per pixel, i got the following 1024 X 1024 output withtin 16.81 seconds.
![]() |
Diagnostic setup contains 80 non-scrambled spheres |
I also rendered the scrambled spheres and it took 34.29 seconds with the very same number of samples. Here goes the following image:
![]() |
Diagnostic setup contains 80 scrambled spheres |
Another straightforward partitioning scheme is to partition the primitives into two equal-sized subsets such that the first half of the n of them are the n/2 with the smallest centroids coordinate values along the chosen axis and the second half are the ones with the largest centroid coordinate values. While this approach can sometimes work well, there are instances where this method also fares poorly. I rendered the scrambled and non-scrambled spheres with this partitioning scheme; it took 34.52 seconds for the scrambled version and non-scrambled version took 16.56 seconds.
The two partitioning approaches i have described so far can work well for some distributions of primitives, but they often choose partitions that perform poorly in practice, resulting in more nodes of the tree being visited by rays and hence unnecessarily inefficient ray-primitive intersection computations at rendering time. Most of the current algorithms for building acceleration structures for ray tracing are based on surface area heuristic, which provide a well-grounded cost model for answering questions like "which of a number of partitions of primitives will lead to a better BVH for ray - primitive intersection test" [2]. The SAH model estimates the computational test of performing the ray intersection tests, including the time spent for traversing the nodes of the tree and time spent on the ray.
I rendered the following 512 X 512 image with 3X3 samples per pixel using all of the above discussed splitting criteria.
- Mid-point split - 80.37 seconds.
- SAH split - 71.69 seconds.
- Equal split - 101.67 seconds.
We can see that the SAH partition is clearly the winner here.
![]() |
BVH with SAH splitting 1024X1024 image 100 samples 2040.39 seconds |
References
[1] Lund University Graphics Group
[2] Physically Based Rendering, Matt Pharr, Greg Humphreys
[3] Realistic Ray Tracing, Peter Shirley.
[4] Real-time Collision Detection.