University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3
- Zimeng Yang
- Tested on: Windows 10, i7-4850 @ 2.3GHz 16GB, GT 750M (Personal Laptop)
- Basic
- BSDF evaluation : diffuse, perfect specular and imperfect specular surface
- path termination using stream compaction
- toggleable method of sorting path/intersection continuous by material type
- toggleable method of using first bounce caching
- More on the way...
- Fresnel Refraction(Schlick's Approximation), Depth of Field and Stochastic AA
- Motion Blur
- Texture Mapping and Bump Mapping
- Constructive Solid Geometry (not fully implemented)
Above rendering is demonstrating features that include diffuse/reflective/refractive(Fresnel) materials, differently textured cube/sphere with normal mapping, motion blur(the shaking red cube) and Constructive Solid Geometry(not real, only basic ideas). The textured refractive sphere is inside a CSG object, which is constructed by red cube difference green sphere. There are not one geometry. The caustic effect caused(projected) by textured sphere can also be captured well.
For more renderings:
without DOF | with DOF |
---|---|
![]() |
![]() |
texture mapping only | texture and normal mapping |
---|---|
![]() |
![]() |
In above renderings, right image is for illustrating the effects of normal mapping. Left two spheres are only textured, right two spheres are textured and normal mapped.
All features mentioned above can be modified in input file. See below for details.
transmission test (with AA) |
---|
![]() |
- Iterations: ~3300
- Test render for:
- perfect transmission (right sphere): 1.0 refraction
- weighted material (left sphere): 0.8 refraction + 0.1 reflection + 0.1 diffuse
diffuse | non perfect specular vs different specular exponents |
---|---|
![]() |
![]() |
In above rendering, right image is for demonstrating the influence of specular exponents. Three spheres, from left to right: very high specular exponent (performs like a perfect specular), medium specular exponent and low specular exponent (performs more like a diffuse).
Reference link: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch20.html
- Iterations: ~2000
- Apply GPU Gem 3, Chpt 20 - Non perfect specular material approximation
Three options: reshuffleByMaterialIDs
, useFirstBounceIntersectionCache
and stochasticAntialiasing
can be toggled in scene->state
varialbe.
Test scene: scenes/test_recording_time.txt
.
For following comparison, use opt_id for reshuffleByMaterialIDs, use opt_fbi for useFirstBounceIntersectionCache and use opt_aa for stochasticAntialiasing.
In following chart, only one optimization was applied for each test case. None means all three opts are false.
- sorting by material id: extremely slow. Sorting 640000(maximum) rays twice for each depth intersections? In this framework, path segments and intersections are separated as two arrays, which means either to sort them twice to make them relate to each other or combine them together. Either way will destroy the performance. So the framework is not working well this option and this option is applicable.
- first bounce intersection cache: this option is not acceptable and applicable when applying anti-aliasing, AA is used for improving rendering quality with only small extra cost, which can be proved from the chart above.
- stochastic antialising: use trivial cost and improve the rendering quality. Worth to do.
Implement a Fresnel Effect refraction evaluation using Schlick's Approximation. In the following rendering, right sphere is rendered using Fresnel refraction effect. Left sphere is rendered with 0.2 refraction + 0.1 reflection and 0.7 diffuse. Fresnel effect can better approximate the reflection contribution between two media.
focal length = 10 | focal length = 11.5 |
---|---|
![]() |
![]() |
How to apply DOF in input file: modify property of camera like
...
DOF 1 10.5
First variable means len radius, second means focal length.
Reference: PBRT [6.2.3].
with AA | without AA |
---|---|
![]() |
![]() |
For the detail comparison:
rotation + translation | scale + translation + non Motion Blur object |
---|---|
![]() |
![]() |
Motion blur was implemented by interpoalting between two postures. Translation, rotation and scale bluring can be applied separately or in combination with others. Input format: [] means optional input
// cube
OBJECT 6
cube
material 1
TRANS 2 4 0
ROTAT 0 0 0
SCALE 1 2 1
[TRANS_DST x x x]
[SCALE_DST y y y]
[ROTAT_DST z z z]
If TRANS_DST/SCALE_DST/ROTAT_DST are not specified, no motion blur will be appled in translation/scale/rotation. Motion Blur is an effect for individual object, objects without any optional input will be rendered normally.
See scenes/test_motion_blur.txt
for input details.
During interpolation, destination posture has higher possibility(10%) to be choosen. This will make the object look like ending up somewhere instead of floating all around.
Implemention of cube and sphere UV coordinates mapping and normal mapping.
texture mapping only | texture mapping and normal mapping |
---|---|
![]() |
![]() |
texture 1 | texture 2 | normal map 1 | normal map 2 |
---|---|---|---|
![]() |
Above rendering : Iterations = 5000, texture mapping for sphere and cube and for diffuse/specular materials. Normal mapping can enhance reality of renderings.
For now, things can be done:
- Loading multiple texture files into GPU, calculate texture color while path tracing.
- Can be combined with different materials(reflect,refract and diffuse).
- Specify input texture file in input file, "NULL" means no texture(or normal mapping texture).
- Use stb_image. The same thing as image class in framework.
Reference: PBRT[10.4 & 10.5.2] and https://en.wikipedia.org/wiki/Bump_mapping.
....
REFRIOR 0
EMITTANCE 0
TEXTURE texture_sphere.png
NORMAL_MAP 184.JPG
....
For procedural texture, texture color is calculated intead of being read from texture file. I didn't implement a complex procedural texture computation function, I just followed the pbrt [10.5.2] and tried to implement checkerboard texture for sphere. And then I compared the averaged iteration time for these too:
checkerboard texture = 165.34ms | file-loaded texture = 165.10ms |
---|---|
![]() |
![]() |
For this simple comparison, the calculation will have more performance impact than texture file I/O. For Complex procedural texture, it might reduce the performance.
But procedural texture has benefits like: infinite resolution, easily generating small texture. But file-loaded texture is able to load much more complex texture file.
Reference: https://en.wikipedia.org/wiki/Procedural_texture.
References : (under references
folder)
- slides from CIS560 computer graphics.
- Blister: GPU-based rendering of Boolean combinations of free-form triangulated shapes
A - B | B - A |
---|---|
![]() |
![]() |
A Union B | A Intersect B |
---|---|
![]() |
![]() |
Testing for basic ops correctness, only low iterations. |
- Basic operations testings: A is a red cube, B is a green sphere. Hardcoded, to-do: build tree.
- Test renderings: ~ 200 iterations.
- Build entire stucture in reference paper requires much longer time... need to study the paper more.
Refer to section Basic Features.
Above data is collected using test_stream_compation.txt
as input file. (under scenes
folder)
When applying stream compaction, the iteration execution time will be smaller than without stream compaction (under same test scene). Since a lot of terminated rays are removed and not able to launch extra kernel calls, the averaged execution time reduces.
Above data is collected using text_closed_scene.txt
as input file. (under scenes
folder)
The closed scene was constructed by adding an extra front wall to form a closed scene. The ray will not be easily escape from the scene compared with open scene setting. Even though stream compaction is applied, most rays will remain active with the increase the ray depth, stream compaction did not comtribute too much to the overall performace.
In conclusion, stream compation is usefull if a lot of rays will be terminated during the increase of ray depth. But if the rays remain active for a long ray depth, stream compation doesn't help a lot.
Using NSight for performance analysis, we can get timeline as:
and also a summary of total time of different launched kernel functions:
From these analysis, kernel function pathTraceOneBounce
took more than half of the total execution time. This function is used to calculate the intersections of the path rays. In this function, depending on the differrent types of geometries, computation time will vary a lot. It might be easied to compute the intersection of sphere compared with the intersection of CSG object (even though only the basic operations). SO threads of this function is very likely not finished at the same time. There should be a lot of optimization in the future.
Same reason, shadingAndEvaluatingBSDF
will compute the scattered path ray according to the material type. This function also took a a lot of computation resources.