DXR AO

Introduction
(Edit 3/5/2020: An updated version of the demo can be downloaded here, which support high DPI monitor and some bug fixes)
It has been 2 months since my last post. For the past few months, the situation here in Hong Kong was very bad. Our basic human rights are deteriorating. Absurd things happens such as suspected cooperation between police and triad, as well as the police brutality (including shooting directly at the journalists). I really don't know what can be done... May be, could you spare me a few minutes to sign some of these petitions? Although such petitions may not be very useful, at least after signing some of them, the US Congress is discussing the Hong Kong Human Rights and Democracy Act now. I would sincerely appreciate your help. Thank you very much!

Back to today's topic, after setting up my D3D12 rendering framework, I started to learn DirectX ray-tracing (DXR). So I decided to start writing an ambient occlusion demo first because it is easier than writing a full path tracer since I do not need to handle material information as well as the lighting data. The demo can be downloaded from here (required a DXR compatible graphics card and driver with Windows 10 build version 1809 or newer).



Rendering pipeline
In this demo, it renders a G-buffer with normal and depth data. Then a velocity buffer will be generated using current and previous frame camera transform, stored in RG16Snorm format. Then rays are traced from world position reconstructed from depth buffer with cosine weight distribution. To avoid ray-geometry self intersection, ray origin is shifted towards the camera a bit. After that, a temporal and spatial filter is applied to smooth out the noisy AO image and then an optional bilateral blur pass can be applied for a final clean up.



Temporal Filter
With the noisy image generated from the ray tracing pass, we can reuse previous frame ray-traced data to smooth out the image. In the demo, the velocity buffer is used to get the pixel location in previous frame (with an additional depth check between current frame depth value and the re-projected previous frame depth value). As we are calculating ambient occlusion using Monte Carlo Integration:


We can split the Monte Carlo integration into multiple frames and store the AO result into a RG16Unorm texture, where red channel stores the accumulated AO result, green channel stores the total sample count N (The sample count is clamped to 255 to avoid overflow). So after a new frame is rendered, we can accumulated the AO Monte Carlo Integration with the following equation:



We also reduce the sample count by the delta depth difference between current and previous frame depth buffer value (i.e. when the camera zoom out/in) to "fade out" the accumulated history faster to reduce ghosting artifact.

AO image traced at 1 ray per pixel
AO image with accumulated samples over multiple frames

But this re-projection temporal filter have a short coming that the geometry edge would failed very often (especially when done in half resolution). So in the demo, when re-projection failed, it will shift 1 pixel to perform the re-projection again to accumulate more samples.

Many edge pixels failed the re-projection test
With 1 pixel shifted, many edge pixels can be re-projected

As the result is biased, I have also reduced the sample count by a factor of 0.75 to make the correct ray-traced result "blend in" faster.

Spatial Filter
To increase the sample count for Monte Carlo Integration, we can reuse the ray-traced data in the neighbor pixels. We search in 5x5 grid and reuse the neighbor data if they are on the same surface by comparing their delta depth value (i.e. ddx and ddy generated from depth buffer). As the delta depth value is re-generated from depth buffer, some artifact may been seen on the triangle edge.

noisy AO image applied with a spatial filter
artifact shown at the triangle edge by re-constructed delta depth
To save some performance, beside using half resolution rendering, we can also choose to interleave the ray cast every 4 pixels and ray cast the remaining pixels in next few frames.

Rays are traced only at the red pixels
to save performance
For those pixels without any ray traced data during interleaved rendering, we use the spatial filter to fill in the missing data. The same surface depth check in spatial filter can be by-passed when the sample count(stored in green channel during temporal filter) is low, because it is better to have some "wrong" neighbor data than have no data for the pixel. This also helps to remove the edge artifact shown before.

Rays are traced at interleaved pattern,
leaving many 'holes' in the image
Spatial filter will fill in those 'holes'
during interleaved rendering

Also, when ray casting are interleaved between pixels, we need to pay attention to the temporal filter too. We may have a chance that we re-project to previous frame pixel which have no sample data. In this case, we snap the re-projected UV to the pixel that have cast interleaved ray in previous frame.

Bilateral Blur
To clean up the remaining noise from the temporal and spatial filter. A bilateral blur is applied, we can have a wider blur by using the edge aware A-Trous algorithm. The blur radius is adjusted according to the sample count (stored in green channel in temporal filter). So when we have already cast many ray samples, we can reduce the blur radius to have a sharper image.

Applying an additional bilateral blur to smooth out remaining noise

Random Ray Direction
When choosing the random ray cast direction, we want those chosen direction can have a more significance effect. Since we have a spatial filter to reuse neighbor pixels data, so we can try to cast rays in directions such that the angle between the ray direction in neighbor pixels should be as large as possible and also cover as much hemisphere area as possible.



It looks like we can use some kind of blue noise texture so that the ray direction is well distributed. Let's take a look at how the cosine weighted random ray direction is generated:



From the above equation, the random variable ϕ is directly corresponding to the random ray direction on the tangent plane, which have a linear relationship between the angle ϕ and random variable ξ2. Since we generate random numbers using wang hash, which is a white noise. May be we can stratified the random range and using the blue noise to pick a desired range to turn it like a blue noise pattern. For example, originally we have a random number between [0, 1), we may stratified it into 4 ranges: [0, 0.25), [0.25, 0.5), [0.5, 0.75), [0.75, 1). Then using the screen space pixel coordinates to sample a tileable blue noise texture. And according to the value of the blue noise, we scale the white noise random number into 1 of the 4 stratified range. Below is some sample code of how the stratification is done:
int BLUE_NOISE_TEX_SIZE= 64;
int STRATIFIED_SIZE= 16;
float4 noise= blueNoiseTex[pxPos % BLUE_NOISE_TEX_SIZE];
uint2 noise_quantized= noise.xy * (255.0 * STRATIFIED_SIZE / 256.0);
float2 r= wang_hash(pxPos);  // random white noise in range [0, 1)
r = mad(r, 1.0/STRATIFIED_SIZE, noise_quantized * (1.0/STRATIFIED_SIZE));
With the blue noise adjusted ray direction, the ray traced AO image looks less noisy visually:

Rays are traced using white noise
Rays are traced using blue noise
Blurred white noise AO image
Blurred blue noise AO image

Ray Binning
In the demo, ray binning is also implemented, but the performance improvement is not significant. The ray binning only show a large performance gain when the ray tracing distance is large (e.g. > 10m) and turning off both half resolution and interleaved rendering. I have only ran the demo on my GTX 1060, may be the situation will be different on RTX graphcis card (so, this is something I need to investigate in the future). Also the demo may have a slight difference when toggling on/off ray binning due to the precision difference using RGBA16Float format to store ray direction (the difference will be vanished after accumulating more samples over multiple frames with temporal filter).

Conclusion
In this post, I have described how DXR is used to compute ray-traced AO in real-time using a combination of temporal and spatial filter. Those filters are important to increase the total sample count for Monte Carlo Integration and getting a noise free and stable image. The demo can be downloaded from here. There are still plenty of stuff to improve, such as having a better filter, currently, when the AO distance is large and both half resolution and interleaved rendering is turned on (i.e. 1 ray per 16 pixels), the image is too noisy and not temporally stable during camera movement. May be I need to improve those stuff when writing a path tracer in the future.

References
[1] DirectX Raytracing (DXR) Functional Spec https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html
[2] Edge-Avoiding À-Trous Wavelet Transform for fast GlobalIllumination Filtering https://jo.dreggn.org/home/2010_atrous.pdf
[3] Free blue noise textures http://momentsingraphics.de/BlueNoise.html
[4] Quick And Easy GPU Random Numbers In D3D11 http://www.reedbeta.com/blog/quick-and-easy-gpu-random-numbers-in-d3d11/
[5] Leveraging Real-Time Ray Tracing to build a Hybrid Game Engine http://advances.realtimerendering.com/s2019/Benyoub-DXR%20Ray%20tracing-%20SIGGRAPH2019-final.pdf
[6] ”It Just Works”: Ray-Traced Reflections in 'Battlefield V' https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s91023-it-just-works-ray-traced-reflections-in-battlefield-v.pdf

沒有留言:

發佈留言