Simple GPU Path Tracer

Path tracing is getting more popular in recent years. And because it is easy to get the code run in parallel, so making the path tracer to run on GPU can greatly reduce the rendering time. This post is just my personal notes about learning the basic of Path Tracing and to make me familiar with the D3D12 API. The source code can be downloaded here. And for those who don't want to compile from the source, the executable can be downloaded here.

Rendering Equation
Like other rendering algorithm, path tracing is solving the rendering equation:

To solve this integral, Monte Carlo Integration can be used, so we will shoot many rays within a single pixel from the camera position.

During path tracing, when a ray hits a surface, we can accumulate its light emission as well as the reflected light of that surface, i.e. computing the rendering equation. But we only take one sample in the Monte Carlo Integration so that only 1 random ray is generated according to the surface normal, which simplify the equation to:

Since we shoot many rays within a single pixel, we can still get an un-biased result. To expand the recursive path tracing rendering equation, we can derive the following equation:

GPU random number
To compute the Monte Carlo Integration, we need to generate random number on the GPU. The wang_hash is used due to its simple implementation.
  1. uint wang_hash(uint seed)
  2. {
  3.     seed = (seed ^ 61) ^ (seed >> 16);
  4.     seed *= 9;
  5.     seed = seed ^ (seed >> 4);
  6.     seed *= 0x27d4eb2d;
  7.     seed = seed ^ (seed >> 15);
  8.     return seed;
  9. }
We use the pixel index as the input for the wang_hash function.
seed = px_pos.y * viewportSize.x + px_pos.x
However, there are some visible pattern for the random noise texture using this method (although not affecting the final render result much...):

Luckily, to fix this, we can simply multiple a random number for the pixel index which eliminate the visible pattern in the random texture.
seed = (px_pos.y * viewportSize.x + px_pos.x) * 100 

To generate multiple random numbers within the same pixel, we can add the random seed by a constant number after each call to the wang_hash function. Any constant larger than 0, (e.g. 10) will be good enough for this simple path tracer.
  1. float rand(inout uint seed)
  2. {
  3.     float r= wang_hash(seed) * (1.0 / 4294967296.0);
  4.     seed+= 10;
  5.     return r;
  6. }
Scene Storage
To trace ray on the GPU, I upload all the scene data(e.g. triangles, material, light...) into several structure buffers and constant buffer. Due to my laziness and the announcement of DirectX Raytracing, I did not implement any ray tracing acceleration structure like BVH. I just store the triangles in a big buffer.

Tracing Rays
By using the rendering equation derived above, we can start writing code to shoot rays from the camera. During each frame, for each pixel, we trace one ray and reflect it multiple times to compute the rendering equation. And then we can additive blend the path traced result over multiple frames to get a progressive path tracer using the following blend factor:

To generate the random reflected direction of any ray hit surface, we simply uniformly sample a direction on the hemi-sphere around surface normal:

Here is the result of the path tracer when using the uniform random direction and using an emissive light material. The result is quite noisy:

Uniform implicit light sampling, 64 sample per pixel

To reduce noise, we can weight the randomly reflected ray with a cosine factor similar to the Lambert diffuse surface:

Cos weighted implicit light sampling, 64 sample per pixel
The result is still a bit noisy. Because in our scene, the light source is not very large, the probability of a randomly reflected ray to hit the light source is quite low. So to improve this, we can explicit sample the light source for every ray that hit a surface.

To sample a rectangular light source, we can randomly choose a point over its surface area, and the corresponding probability will be:
1/area of light
Since our light sampling is over the area domain instead of the direction domain as state in the above equation. The rendering equation need to multiply by the Jacobian that relates solid angle to area. i.e.

With the same number of sample per pixel, the result is much less noisy:

Uniform explicit light sampling, 64 sample per pixel
Cos weighted explicit light sampling, 64 sample per pixel

Simple de-noise

As we have seen above, the result of path tracing is a bit noise even with 64 samples per pixel. The result will be even worse for the first frame:

first frame path traced result
There are some very bright dots and looks not good during camera motion. So I added a simple de-noise pass, which is just blurring lots of pixels where they are located on the same surface (which really need a lot of pixel to make the result looks good, which cost some performance...).

Blurred first frame path traced result
To identify the pixel correspond to which surface, we store this data in the alpha channel of the path tracing texture with the following formula:
dot(surface_normal, float3(1, 10, 100)) + (mesh_idx + 1) * 1000
This works because we only contains small number of mesh and the mesh normal are the same for each surface in this simple scene.

Random Notes...
During the implementation, I encounter various bugs/artifacts which I think is interesting.

First, is about the simple de-noise pass. It may bleed the light source color to neighbor pixel far away even we have per pixel mesh index data.

This is because we only store a single mesh index per pixel, but we jitter the ray shot from camera within a single pixel per frame, some of the light color will be blend to the light geometry edge. It get very noticeable because the light source have a very high radiance compared to the reflect light of ceiling geometry.

To fix this, I just simply do not jitter the ray for tracing a direct hit of light geometry from camera, so this fix can only apply to explicit light sampling.

The second one is about quantization when using 16bit floating point texture. The path tracing texture sometimes may get quantized result after several hundred frames of additive blend when the single sample per pixel path trace result is very noise.

Quantized implicit light sampling
Path traced result in first frame
simple de-noised first frame result
To work around this, 32bit floating point texture need to be used, but this may have a performance impact (explicitly for my simple de-noise pass...).

The last one is the bright flyflies artifact when using a very large light source (as big as ceiling). This may sound counter intuitive. And the implicit light path traced result(i.e. not sampling the light source directly) does not have those flyflies...

Explicit light sample result
Implicit light sample result
But it turns out this artifact is not related to the size of the light source, but is related to the light too close to the reflected geometry. To visualize it, we may look at how the light get bounced:

path trace depth = 1
path trace depth = 2

The flyflies start to appear in first bound, located at the position near the light source. And then those flyflies get propagated with the reflected light rays. Those large values are generated by explicit light sampling Jacobian transform, the denominator part, which is the distance square between the light and surface.

After a brief search on the internet, to fix this, either need to implement radiance clamping or bi-directional path tracing, or greatly increase the sampling number. Here is the result with over 75000 number of samples per pixel, but it still contains some flyflies...

In this post, we discuss the steps to implement a simple GPU path tracer. The most basic path tracer is simply shooting large number of rays per pixel, and reflect the ray multiple times until it hits a light source. With explicit light sampling, we can greatly reduce noise.

This path tracer is just my personal toy project, which only have Lambert diffuse reflection with a single light. It is my first time to use the D3D12 API, the code is not well optimized, so the source code are for reference only and if you find any bugs, please let me know. Thank you.

[1] Physically Based Rendering

Render Passes in "Seal Guardian"

"Seal Guardian" uses a forward renderer to render the scene. Because we need to support mobile platform, we don't have too many effect in it. But still it consists of a few render passes to compose an image.

Shadow Map Pass
To calculate dynamic shadow of the scene, we need to render the depth of the meshes from the light point of view. We render them into a 1024x1024 shadow map.
Standard shadow map

Then we use the Exponential Shadow Map method to blur the shadow map into a 512x512 shadow map.
ESM blurred shadow map

(Note that this pass may be skipped according to current performance setting.)

Opaque Geometry Pass
In this pass, we render the scene meshes into a RGBA8 render target. We compute all the lighting including direct lighting, indirect lighting(lightmap or SH probe), tone mapping in this single pass. This is because on iOS, reducing render pass may have a better performance, so we choose to combine all the calculation into a single pass.
Tonemapped opaque scene color
Opaque geometry depth bufer

To reduce the impact of overdraw, we pre-compute a visibility set to avoid drawing occluded mesh (may talk about it in future post). Also we want to add a bloom pass to enhance the effect of bright pixels, we compute a bloom value in this pass according to the pre-tone mapped value and store it in the alpha channel of this pass.

Transparent Geometry Pass
In this pass, we render transparent mesh and particle. We blend the post-tonemapped color with the opaque geometry due to performance reason. Also, because we store the bloom intensity in the alpha channel and we want the alpha geometry to affect the bloom result. We solve this by 2 different methods depending on the game runs on which platform.

On iOS, we render the mesh directly to the render target of the opaque geometry pass with a shader similar to the opaque pass by outputting tonemapped  scene color in RGB and bloom intensity in A. To blend those 4 values over the opaque value, we use the EXT_shader_framebuffer_fetch OpenGL extension. So the blending happens at the end of the transparent geometry shader and we choose the simple blending formula below by using the opacity of the mesh(because we want to make it consistent with other platform):
RGB= mesh color * mesh alpha + dest color * (1 - mesh alpha)
A = mesh bloom intensity
* mesh alpha + dest bloom intensity * (1 - mesh alpha)
On Windows and Mac, the EXT_shader_framebuffer_fetch does not exist. We render all the transparent meshes into a separate RGBA8 render target. We compute the scene color and bloom intensity similar to opaque pass, but before writing to the render target, we decompose the RGB scene color into luma and chroma and store the chroma value in checkerboard pattern similar to this paper(slide 104). So we can store luma+chroma in RG channel, bloom intensity in B channel and opacity of mesh in the A channel of the render target.
Transparent render target on Windows platform

Finally, we can blend this transparent texture over the opaque geometry pass render target.
Composed opaque and transform geometry

Post Process Pass
After those geometry passes, we can blend in the bloom filter. We make several blur passes for those bright pixels and additive blend over the previous render pass output to enhance the bright effect.
Blurred bright pixels
Additive blended bloom texture with scene color

Then we compute a simplified(but not very accurate, due to the lack of a velocity buffer) temporal anti-aliasing using the color and depth buffer of current frame and previous 2 frames. One thing we didn't mention is that, during rendering the opaque and transparent meshes, we jitter the camera projection by half a pixel, alternating between odd and even frame, similar to the figure below, so that we can have sub-pixel information for anti-aliasing.
Temporal AA jitter pattern
Temporal anti-aliased image

In this post, we break down the render passes in "Seal Guardian", which compose of mainly 4 parts: shadow map, opaque geometry, transparent geometry and post process passes. By making less render pass, we can achieve a constant 60FPS in most cases (if target framerate is not met, we may skip some render pass such as temporal AA and shadow).

Lastly, "Seal Guardian" has already been released on Steam / Mac App Store / iOS App Store. If you want to support us to develop games with custom tech, then buying a copy of the game on any platform will help. Thank you very much.

[1] The Art and Technology behind Crysis 3

Shadow in "Seal Guardian"

"Seal Guardian" uses a mix of static and dynamic shadow systems to support long range shadow to cover the whole level. "Seal Guardian" only use a single directional for the whole level, so part of the shadow information can be pre-computed. It mainly consists of 3 parts: baked static shadow on static meshes stored along with the light map, baked static shadow for dynamic objects stored along with the irradiance volume and dynamic shadow with optional ESM soft shadow.

Static shadow for static objects
During the baking process of the light map, we also compute static shadow information. We first render a shadow map for the whole level in a big render target (e.g. 8192x8192), then for each texel of light map, we can compare against its world position to the shadow map to check whether that texel is in shadow. But we are using a 1024x1024 light map for the whole scene, storing the shadow term directly will not have enough resolution. So we use distance field representation[1] to reduce storage size similar to the UDK[2]. To bake the distance field representing of the shadow term, instead of comparing a single depth value at texel world position as before, we compare several values within a 0.5m x 0.5m grid, oriented along the normal at position similar to the figure below:
Blue dots indicate the positions for sampling shadow map
to compute distance field value for the texel at red dot position.
(The gird is perpendicular to the red vertex normal of the texel.)

By doing this, we can get the shadow information around the baking texel to compute the distance field. We choose this method instead of computing the distance field from a large baked shadow texture because we want to have the shadow distance filed consistently computed in world space no matter how the mesh UV is and this can also avoid UV seam too. But this method may cause potential problem for concave mesh, but so far, for all levels in "Seal Guardian", it is not a big problem.
Static shadow only

Static shadow for dynamic objects
For dynamic objects to receive baked shadow, we baked shadow information and store it along with the irradiance volume. For each irradiance probe location, we compare it to the whole scene shadow map and get a binary shadow value. During runtime, we interpolate this binary shadow value by using the position of dynamic object and the probe location to get a smooth transition of shadow value, just like interpolating the SH coefficients of irradiance volume.

Circled objects does not have light map UV, so they are treated the same as dynamic objects and shadowed with the shadow value stored along with irradiance volume
Each small sphere is a sampling location for storing the SH coefficients and shadow value of the irradiance for dynamic objects.

Dynamic Shadow
We use standard shadow mapping algorithm with exponential shadow map(ESM)[3] to support dynamic shadow in "Seal Guardian".  However due to we need to support a variety of hardware(from iOS, Mac to PC) and minimise code complexity, we choose not to use any cascade shadow map. Instead we use a single shadow map to support dynamic shadow for a very short distance (e.g. 30m-60m) and rely on baked shadow to cover the remaining part of the scene.
Dynamic shadow mixed with static shadow
Dynamic shadow only

Shadow Quality Settings
With the above systems, we can make a few shadow quality settings:
  1. mix of static shadow with dynamic ESM shadow
  2. mix of static shadow with dynamic hard shadow
  3. static shadow only
On iOS platform, we choose the shadow quality depends on the device capability. Besides, as we are using a forward renderer, when we are drawing objects that outside the dynamic shadow distance, those objects can use the static shadow only shader to save a bit of performance.
Soft Shadow
Hard Shadow
No Shadow

We have briefly describe the shadow system in "Seal Guardian", which uses distance field shadow map for static mesh shadow, interpolated static shadow value for dynamic objects and ESM dynamic shadow for a short distance. Also a few shadow quality settings can be generated with very few coding effort.

Lastly, if you are interested in "Seal Guardian", feel free to check it out and its Steam store page is live now. It will be released on 8th Dec, 2017 on iOS/Mac/PC. Thank you.