顯示包含「D3D12」標籤的文章。顯示所有文章
顯示包含「D3D12」標籤的文章。顯示所有文章

HDR Display

Introduction
Continue with the DXR Path Tracer in the last post, I updated the demo to support HDR display. It also has various small features updated such as adding per monitor high DPI support, path trace resolution scale (e.g. path trace at 1080, and bilinear upscale to 4K.) and added dithering to tone mapped output to reduce color banding (Integer back buffer format only). The updated demo can be downloaded here.
A photo of my HDR TV to show the difference between SDR and HDR

HDR Color Spaces
There are 2 swapchain formats/color spaces can be chosen to output HDR images on HDR capable monitor/TV:
1. Rec 2020 color space (DXGI_FORMAT_R10G10B10A2_UNORM + DXGI_COLOR_SPACE_RGB_FULL_G2084_NONE_P2020)
2. scRGB color space (DXGI_FORMAT_R16G16B16A16_FLOAT + DXGI_COLOR_SPACE_RGB_FULL_G10_NONE_P709)
Rec2020 color space is the common HDR10 format with PQ EOTF. But it is recommended to use scRGB color space on Windows. scRGB use the same color primaries at Rec709, which support wide color using negative values. I was confused about using negative values to represent a color (as well as intensity) at first. Because color gamut is usually displayed as CIE xy chromaticity diagram, I used to think of a given RGB value as a color interpolated inside the Red/Green/Blue gamut triangle using the barycentric coordinates
A debug chromaticity diagram showing Rec2020 color gamut.
Using a HDR backbuffer will show less clipped color
Although, it makes sense to represent a wide color using negative numbers using barycentric interpolation, I was confused how it to represent the intensity at the same time. It is because the chromaticity diagram skipped the luminance information. So, instead of thinking color inside the horseshoe-shaped diagram, it is easier for me to think about the color in 3D XYZ color space. The Red/Green/Blue color primaries of a gamut is 3 basis vectors in the XYZ color space. A linear combination of RGB values with the color primaries basis vectors can represent a color and intensity. Thinking in this way make me feel more comfortable. So the path traced lighting values are fed into ACES HDR tone mapper, transformed into Rec709 color space, and then divided by 80 when using scRGB color space(scRGB requires value of 1 to represent 80 nit).

HDR10 metadata
I have also played around with HDR10 metadata to see how it affects the image. But most of the data does not affect the image on my Samsung UA49MU7300JXKZ TV. The only data have effects on the image is "Max Mastering Luminance" which can affect the image brightness. Setting it to a small value will make image darker despite outputting a very bright 1000 nit image. Also, the HDR10 metadata only works in Full Screen Exclusive mode, or borderless windowed mode with HWND_TOPMOST Z order (I guess is the full screen optimization get enabled), using borderless window with HWND_TOP Z order won't work (but this mode is easier to alt-tab on single monitor set up...). Besides, entering Full Screen Exclusive mode may fail when calling SetFullscreenState() if the display is not connected to the adapter that used for rendering. I didn't notice this until I started to work on laptop which use the RTX graphics card for ray tracing and the laptop monitor is connected to the Intel graphics card. Looks like some hard work need be done in order to support Full Screen Exclusive mode properly (e.g. create a D3D Device/Command Queue/Swapchain and for the Intel graphics card and copy the ray traced image from the RTX graphics to Intel graphics card for full screen swapchain). But unfortunately, my demo does not support multi-adapter, so the HDR10 metadata may not work on such setup (I am outputting to the external HDR TV using the RTX graphics card, so that doesn't create a much problem for me...)...

Capability of my HDR TV
UI showing all the adjustable HDR10 meta in the demo

UI
Blending SDR UI with HDR image is handled in 2 steps: blend color and then brightness. All the UI is rendered into an off screen buffer (in 8 bit Rec709 color space). And then later blended with the ACES tone mapped image. Take a look at the ACES tone mapping function snippet below, the lighting value will be mapped to a normalized range in AP1 color space as an intermediate step (red part in the below code snippet). So in the demo, the UI in off screen buffer will be converted to AP1 color space and then blend with the tone mapped image at this step. (I have also tried blending in XYZ space and the result is similar in the demo.)

ACES tone map function snippet

Then the UI blended image can be transformed into target color space (e.g. scRGB or Rec2020, purple part in the above code snippet). When converting the normalized color data to HDR data, the ACES tone mapper interpolates the RGB values between Y_MIN and Y_MAX (i.e. blue part in the above code snippet). During this brightness interpolation, the demo adjust Y_MAX (e.g. 1000 to 4000 nits) to user defined UI brightness (e.g. 80 to 300 nits) depending on the UI alpha, using the following formula:



In the demo, BlendPow is set to 5 as default value. Although the result is not perfect (e.g. the UI alpha value may not looks blending linearly, depending on background luminance), it works well enough to avoid bleeding though from a bright background with UI alpha > 0.5:

Background Luminance: 0-10 nit
Background Luminance: 2 - 250 nit
Background Luminance: 1000 nit
Photos showing UI blending with HDR background, from dark to bright.

However, the above blending formula have artifact when Y_MAX is smaller than the UI brightness (But this may not happens and only happens in some debug view mode). In this case, the background may looks too bright after blending with the UI. To fix this, inverting the BlendPow may helps to minimize the artifact:



The below is an extreme example showing the artifact with Y_MAX set to 1 nit and UI set to 80 nit:
Without the fix, the artifact is very noticeable at the upper right corner
With the fix, the background is darkened properly
Looking back to the blue part of the ACES tone mapping function in the above code snippet, I was wondering will the image looks different as the Y_MIN and Y_MAX values are interpolated in the display color space (e.g. will the image looks different when interpolating in scRGB / Rec2020). To compare whether these 2 color are the same, we need to have both color in the same space. So let's convert the interpolated RGB value back to XYZ space:



So, interpolating in different display color space do have some differences as long as Y_MIN != 0. But The ACES tone mapper which defaults to use the STRECH_BLACK option which effectively setting Y_MIN= 0, so there should be no difference when interpolating values in difference color space. Out of curiosity, I have tried to disable the STRECH_BLACK option to see whether the image will look different when switching between the scRGB and Rec2020 back buffer with Y_MIN > 0, but the images still looks the same with a large Y_MIN... I am not sure why this happens, may be the difference is too small to be noticeable... In the demo, I take this into account and treat the Y_MIN as a value in the XYZ color space and interpolate the value like this:



Debug View Mode
To help debugging, several debug view modes are added. A "Luminance Range" view mode is used to display the luminance value of a pixel before tone mapping (ie. the physical light luminance entering the virtual camera) and after tone mapping (i.e. the output light luminance the monitor should be displayed):
Showing pixel luminance before tone mapping.
Showing pixel luminance after tone mapping.

A "Gamut Clip Test" mode to high light pixel that fall outside Rec709 gamut, i.e. those color can only be viewed with a HDR display or wide color monitor (e.g. AdobeRGB / P3 monitor).
Highlight clipped pixel with cyan color
A "SDR HDR Split" mode to compare the SDR/HDR images. But this mode can only be a rough preview of how SDR will look like on HDR back buffer. It is because SDR images are usually displayed brighter than 80 nit, the SDR part need to be brighten up (say to 100-200 nit) to make it look similar to using a real SDR back buffer. Also, in this view mode, because the HDR back buffer expect a pixel value in nit (either linear or PQ encoded), I don't apply any gamma curve to the SDR portion of the image, which may also result in differences to a real SDR version too.
A photo showing both SDR and HDR at the same time.
Bloom is exaggerated to show the clipped color in SDR
Conclusion
In this post, I have talked about the color spaces used for outputting to HDR display. No matter which color space format is used (e.g. scRGB/ Rec2020), the displayed image should be identical if transformed correctly (except some precision difference). Also, I have tried to play around with the HDR10 metadata, but most of the metadata does not change the image on my TV... I guess how the metadata is interpreted is device dependent. Lastly, SDR UI is composited with the HDR image by first blending with the color and then brightness. A simple blending formula is enough for the demo. A complicated algorithm can be explored in the future, say brighten up the UI depending on the brightness of a blurred HDR background (e.g. may be storing the background luminance in the alpha channel and blur it together with bloom pass?). A demo can be downloaded here to test on your HDR display (Some of the options are hid if not connecting to HDR display).

References
[1] https://channel9.msdn.com/Events/Build/2017/P4061
[2]  https://www.pyromuffin.com/2018/07/how-to-render-to-hdr-displays-on.html
[3] https://www.gdcvault.com/play/1024803/Advances-in-the-HDR-Ecosystem
[4] https://www.gdcvault.com/play/1026443/Not-So-Little-Light-Bringing
[5] https://developer.nvidia.com/implementing-hdr-rise-tomb-raider
[6] https://www.asawicki.info/news_1703_programming_hdr_monitor_support_in_direct3d
[7] https://onedrive.live.com/?authkey=%21AFU3moSbUzyUgaE&cid=A4B88088C01D9E9A&id=A4B88088C01D9E9A%21170&parId=A4B88088C01D9E9A%21106&o=OneUp
[8] https://www.shadertoy.com/view/4tV3WW

DXR Path Tracer

Introduction
Can't believe it has been half a year since my last DXR AO post. It was a hard time in Hong Kong last year, but thanks to the social unrest and medical worker's strike, the Wuhan Coronavirus does not spread widely in the local community (but still have new cases everyday...). Due to the virus, it is better to stay at home, so I continue to code my path tracer. This new path tracer is unbiased by terminating rays with Russian Roulette. During path tracing, physical light unit are used. Also, rendering can be done in wide color space. Finally, the path traced result is tone mapped and output to sRGB/wide color gamut depends on the display device. A demo can be downloaded here (Please use latest graphics driver to run as I have encountered device remove hang on my laptop RTX2060 with old driver, but not on my desktop GTX1060... If the crash/hang still happens, please let me know. Thank you.).

Path Traced Sponza scene.


Render Loop
At the start of the demo (or after any camera movement/lighting changes), a structured buffer, Ray Buffer, is initialized with 1 ray per pixel using the camera transform.
The struct stored in Ray Buffer, not tightly packed for easier understanding.
Then a ray generation shader is dispatched to read the Ray Buffer and trace rays into the scene. Lighting is calculated and generate new Ray Buffer elements if the rays are not terminated and continue the path tracing in the next frame. Below is a simplified flow of the rendering operation executed every frame (with render passes on the left and resources on the right):

A simplified path tracing flow executed every frame
Let's start with the usage of resources in the above flow chart first:
  • Ray Buffers are structured buffers storing the RayData struct. A ray will be traced for each element and if the ray is not terminated by Russian Roulette, it will be stored back to the Ray Buffer for the next frame.
  • Lighting Path Texture is used for accumulating the lighting result when a ray is traversing along the path from the camera. It can be think of an intermediate result because the path is not fully traversed within a single frame, but across several frames.
  • Progress Buffer is a 8 bytes buffer, with 4 bytes storing the current path depth and other 4 bytes storing the total accumulated sample count.
  • Lighting Sample Texture is used for accumulating the lighting result of all the terminated rays (i.e. accumulating all the terminated ray result from Lighting Path Texture).
About operation done in each render pass:
  1. Ray Tracing Pass dispatch a ray generation shader, sampling the Ray Buffer according to the DispatchRaysIndex() and then calling TraceRay() to calculate the lighting result inside the closest hit shader by randomly choosing diffuse or specular lighting (another shadow ray is traced towards light source during lighting calculation). The lighting result is added to Lighting Path Texture and non-terminated ray will be stored back into Ray Buffer for next frame.
  2. Checking whether all rays are terminated by using D3D12 predicate on the counter buffer of Ray Buffer (i.e. all rays terminated when counter == 0). Then different shader/operation will be executed depends on whether the Ray Buffer is empty.
  3. When there are still rays not terminated, increase the path depth in Progress Buffer.
  4. When all rays are terminated, increase the sample count and set path depth to 0 in Progress Buffer.
  5. Accumulate the current path lighting result in Lighting Path Texture to Lighting Sample Texture. Clear the Lighting Path Texture to 0 (it is cleared via compute shader instead of command list clear as the predicate does not work on the clear API, despite the spec say it would...)
  6. Regenerate the rays in Ray Buffer with 1 ray per pixel using the camera transform for path tracing new lighting samples in next few frames. 
  7. Display the current lighting result to the back buffer.
Path traced images

With the core operations described above, at most 2 rays per pixel can be launched to maintain an interactive frame rate on my GTX1060. On more powerful machines (i.e. RTX cards with hardware accelerated ray tracing), step 1 don't need to be terminated with the first closest hit, but bounce a few more times before storing back to the Ray Buffer (A "#Bounce/Frame" option is added to increase the number of bounce per frame for RTX cards).
Number of bounce per frame option to adjust performance
The current approach described above has 2 drawbacks: First, we don't know how many rays are still left in Ray Buffer on CPU, DispatchRay() is called with the maximum number of rays (i.e. viewport width * height), and terminate early within the ray generation shader. This can be fixed in DXR Tier 1.1 using ExecuteIndirect() in the future. The second drawback is the performance is not constant across several frames, because the number of rays need to be traced decrease every frame and then reset back, so the frame rate fluctuate.

ACES tone mapping
After calculating the HDR lighting value, we need to perform a tone map pass to map the lighting value to a displayable range. ACES tone mapping is chosen due to its popularity in recent years. ACES has a few tone mapping curves (they call it RRT + ODT) for different display with different color gamut and viewing condition. Some common display types are sRGB_100nit and Rec709_100nit. The input of RRT+ODT function expect RGB values in ACES2065-1 (AP0) gamut with a white point around(but not exact) D60. So we need to convert our lighting value (L_sRGB) to AP0 gamut by multiplying a few transformation matrices:
operation to convert RGB value from sRGB to AP0
The above steps means first transforming sRGB values to XYZ color space with D65 white point (gamut transformation matrices can be calculated using the formula from here), then apply a Chromatic Adaption Transform(CAT) due to different white point between sRGB and AP0 (the matrix can be calculated using the formula from here). Finally, the XYZ value can be transformed to AP0 gamut. All these matrices can be combined to perform 1 matrix-vector multiplication as an optimization. Then this value can be feed into ACES RRT+ODT to compute the back buffer value for display.

So we just only need to select the appropriate ODT for the target display device. But unfortunately, not all common display gamut is provided, like my recently bought RTX laptop which comes with a 100% AdobeRGB color gamut monitor. ACES does not provide a suitable ODT to display the image in AdobeRGB color space. If using the common sRGB ODT, image will look too saturated. So I added a "Remap display color gamut" option in the demo:

Remapping option to display the path traced result according to display color primaies
The Remap display color gamut option performs the following steps on the output of RRT+ODT:
  1. Apply EOTF function to the ODT output to get linear lighting value.
  2. Transform the resulting RGB value in step 1 to the target display color gamut RGB value (e.g. AdobeRGB gamut on my laptop display), with Chromatic Adaption Transformation applied.
  3. Apply OETF function to the output of step 2 for display.
By doing the above remapping, I can get similar result between my AdobeRGB laptop monitor and sRGB desktop monitor. But 1 drawback is that, although we can query the color primaries of the display, but it is not always accurate. For example, on my laptop, I can switch to regular sRGB view mode, but the IDXGIOutput6::GetDesc1() is still returning the AdobeRGB color primaries. I have also tried on some other monitors, they have color primaries greater than sRGB, but not exactly AdobeRGB or P3 primaries, and they also have different view mode such as AdobeRGB or sRGB. So I just leave the gamut remapping function optional in the demo and the user can choose their remap color primaries.

Also digging deeper in the ACES ODT source code, the 3 ODT used in the demo share many common code and only have different color space transform function / OETF at the end of ODT. In the future, I may refactor the RRT+ODT code and remove the remap display gamut function and directly transforn the ACES ODT output in XYZ space to the display gamut queried by IDXGIOutput6::GetDesc1() (or user selected gamut).
An ODT from ACES, the blue part is the same for all the 3 ODT used in the demo.
The orange part is different depends on display, which can be replaced by display
primaries returned from IDXGIOutput6::GetDesc1(), so the "Remap display color
gamut" in the demo can be removed in the future. 

WCG rendering
Equipped with the knowledge of transforming between color spaces, I decide to try rendering in Wide Color Gamut instead. Games like GT Sport rendered in wide color already(Rec2020). Performing lighting calculation in wide color gamut can result in more accurate lighting than rendering in sRGB color space (despite displaying on sRGB monitor).

Path Traced result rendered in different color space. Left:sRGB, Center:ACEScg, Right:Rec2020

In the demo, it can path trace in sRGB, ACEScg, Rec2020 color space. Inside the closest hit shader, albedo texture is read and transformed into the chosen rendering color space from sRGB. Also the light color is converted to chosen rendering color space and then multiply with the intensity. Finally inside the tone mapping pass, the result of lighting calculation is transformed to AP0 color space and feed into ACES RRT+ODT for display. You may notice some difference between rendering in sRGB and wide color gamut (e.g. ACEScg and Rec2020). If you have a wide color gamut monitor (e.g. AdobeRGB or DCI-P3), you can try to use the Rec2020 ODT with the "Remap display color gamut" option on (described in last section). This can produce fewer color clamping and display more saturated color. But under normal lighting condition, the difference is not that much, we need to set up specific lighting such as using a local sphere light with saturated color, then those wide color can be displayed. I guess this is due to both albedo texture and light color are in sRGB space, content may need to be adjusted in order to take advantage of wide color display.

Wide Color path traced image, saved with different profile. Left:saved with sRGB profile Right: saved with AdobeRGB profile. 
The right image shows more saturated color when viewed on a color managed browser with a wide color display (e.g. iPhone monitor), 
otherwise 2 images may look the same.

Also, please note that this kind of wide color support is different from Windows 10 HDR/WCG settings. On my laptop, Windows report No for both HDR and WCG, but it do have an AdobeRGB monitor and capable of displaying wide color, we just need to correctly transform the images using the monitor color gamut.
My laptop has an AdobeRGB monitor, but Windows 10 Display capabilities report No for WCG.

ACES ODT blue light artifact
So far everything looks good when rendering in wide color space. Color get desaturated when they are over exposed. But it still has some issue when using a strong blue light...

Using a strong sRGB blue light will introduce hue shift...

It is because pure blue (0, 0, 255) in sRGB space is not saturated enough when transformed to wide color gamut (e.g. ACEScg/Rec2020). Looking inside the ACES dev repo, it has a blue light artifact fix LMT to fix this issue. It works by de-saturating the blue color a bit to lessen the hue shift. So in the demo, I provided a "Blue Correction" parameter to adjust the blue de-saturation strength (As a side note, UE4 also use ACES tone mapper and comes with a blue correction parameter in post process setting).
desaturating blue color to fix the hue shift

But I do like the saturated blue color, using the blue light artifact fix LMT will de-saturate the blue color made me sad. Below is the comparison between with/without the blue light LMT:
Left: without blue light fix LMT, Right: with blue light fix LMT

So may be we can work around the problem in the other way. Instead of making the blue color less saturated, we can make the light color more saturated. So I added a light "Color Picker Space" combo box to specify the color space of the picked RGB light value, so that more saturated blue light color can be chosen. By choosing an extremely saturated blue color (0, 0, 255) RGB value in ACEScg color space. We can get away with the purple color:
Using a saturated blue light in ACEScg space, without the blue light fix LMT

Bloom
Lastly, a bloom pass is added before tone mapping. Bloom pixels are extracted based on a threshold that exceeded the maximum luminance with the current exposure values. The maximum luminance is calculated with:
max luminance calculated using EV100
But simply subtracting the lighting value with the threshold will introduce some hue shift to the bloom color. So the RGB lighting value is transformed to HSV space, subtract the threshold from V, and then transform back to RGB space (We keep all the RGB values in the rendering space without transforming the lighting value to sRGB from ACEScg/Rec2020 during HSV conversion, as there are not much difference between the bloom results). Given an image with HDR values:
Input image for the bloom pass
The differences between using threshold in HSV space and RGB space:
Left column: bloom in HSV space. Right column: bloom in RGB space.
Upper row: Lighted scene combined with bloom.
Lower row: Debug images showing only the bloom component.

The bloom calculated using HSV space introduce less saturated color. The situation will be exaggerated when the image is over-exposed:
Left:Bloom input image.
Center:Bloom in HSV space.
Right: Bloom in RGB space.
Conclusion
In this post, the core algorithm of my DXR path tracer is described, together with some color space conversion. There are much more stuff to be done in the future like, support dynamic geometry during ray tracing, adding a denoiser for path traced output, implement hybrid rasterization/ray tracing rendering, spectral rendering to compute a ground truth reference. Also, this is my first time to write code about color space management. Currently, in the demo, the 3D lighting can be displayed correctly using the monitor gamut, but the UI is not managed properly. Also, 4K and HDR need to be supported too.

References
[1] https://seblagarde.files.wordpress.com/2015/07/course_notes_moving_frostbite_to_pbr_v32.pdf
[2] https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#addressing-calculations-within-shader-tables
[2] https://github.com/ampas/aces-dev
[3] http://www.brucelindbloom.com/index.html?Eqn_RGB_XYZ_Matrix.html
[4] http://www.brucelindbloom.com/index.html?Eqn_ChromAdapt.html
[5] http://www.polyphony.co.jp/publications/sa2018/









DXR AO

Introduction
(Edit 3/5/2020: An updated version of the demo can be downloaded here, which support high DPI monitor and some bug fixes)
It has been 2 months since my last post. For the past few months, the situation here in Hong Kong was very bad. Our basic human rights are deteriorating. Absurd things happens such as suspected cooperation between police and triad, as well as the police brutality (including shooting directly at the journalists). I really don't know what can be done... May be, could you spare me a few minutes to sign some of these petitions? Although such petitions may not be very useful, at least after signing some of them, the US Congress is discussing the Hong Kong Human Rights and Democracy Act now. I would sincerely appreciate your help. Thank you very much!

Back to today's topic, after setting up my D3D12 rendering framework, I started to learn DirectX ray-tracing (DXR). So I decided to start writing an ambient occlusion demo first because it is easier than writing a full path tracer since I do not need to handle material information as well as the lighting data. The demo can be downloaded from here (required a DXR compatible graphics card and driver with Windows 10 build version 1809 or newer).



Rendering pipeline
In this demo, it renders a G-buffer with normal and depth data. Then a velocity buffer will be generated using current and previous frame camera transform, stored in RG16Snorm format. Then rays are traced from world position reconstructed from depth buffer with cosine weight distribution. To avoid ray-geometry self intersection, ray origin is shifted towards the camera a bit. After that, a temporal and spatial filter is applied to smooth out the noisy AO image and then an optional bilateral blur pass can be applied for a final clean up.



Temporal Filter
With the noisy image generated from the ray tracing pass, we can reuse previous frame ray-traced data to smooth out the image. In the demo, the velocity buffer is used to get the pixel location in previous frame (with an additional depth check between current frame depth value and the re-projected previous frame depth value). As we are calculating ambient occlusion using Monte Carlo Integration:


We can split the Monte Carlo integration into multiple frames and store the AO result into a RG16Unorm texture, where red channel stores the accumulated AO result, green channel stores the total sample count N (The sample count is clamped to 255 to avoid overflow). So after a new frame is rendered, we can accumulated the AO Monte Carlo Integration with the following equation:



We also reduce the sample count by the delta depth difference between current and previous frame depth buffer value (i.e. when the camera zoom out/in) to "fade out" the accumulated history faster to reduce ghosting artifact.

AO image traced at 1 ray per pixel
AO image with accumulated samples over multiple frames

But this re-projection temporal filter have a short coming that the geometry edge would failed very often (especially when done in half resolution). So in the demo, when re-projection failed, it will shift 1 pixel to perform the re-projection again to accumulate more samples.

Many edge pixels failed the re-projection test
With 1 pixel shifted, many edge pixels can be re-projected

As the result is biased, I have also reduced the sample count by a factor of 0.75 to make the correct ray-traced result "blend in" faster.

Spatial Filter
To increase the sample count for Monte Carlo Integration, we can reuse the ray-traced data in the neighbor pixels. We search in 5x5 grid and reuse the neighbor data if they are on the same surface by comparing their delta depth value (i.e. ddx and ddy generated from depth buffer). As the delta depth value is re-generated from depth buffer, some artifact may been seen on the triangle edge.

noisy AO image applied with a spatial filter
artifact shown at the triangle edge by re-constructed delta depth
To save some performance, beside using half resolution rendering, we can also choose to interleave the ray cast every 4 pixels and ray cast the remaining pixels in next few frames.

Rays are traced only at the red pixels
to save performance
For those pixels without any ray traced data during interleaved rendering, we use the spatial filter to fill in the missing data. The same surface depth check in spatial filter can be by-passed when the sample count(stored in green channel during temporal filter) is low, because it is better to have some "wrong" neighbor data than have no data for the pixel. This also helps to remove the edge artifact shown before.

Rays are traced at interleaved pattern,
leaving many 'holes' in the image
Spatial filter will fill in those 'holes'
during interleaved rendering

Also, when ray casting are interleaved between pixels, we need to pay attention to the temporal filter too. We may have a chance that we re-project to previous frame pixel which have no sample data. In this case, we snap the re-projected UV to the pixel that have cast interleaved ray in previous frame.

Bilateral Blur
To clean up the remaining noise from the temporal and spatial filter. A bilateral blur is applied, we can have a wider blur by using the edge aware A-Trous algorithm. The blur radius is adjusted according to the sample count (stored in green channel in temporal filter). So when we have already cast many ray samples, we can reduce the blur radius to have a sharper image.

Applying an additional bilateral blur to smooth out remaining noise

Random Ray Direction
When choosing the random ray cast direction, we want those chosen direction can have a more significance effect. Since we have a spatial filter to reuse neighbor pixels data, so we can try to cast rays in directions such that the angle between the ray direction in neighbor pixels should be as large as possible and also cover as much hemisphere area as possible.



It looks like we can use some kind of blue noise texture so that the ray direction is well distributed. Let's take a look at how the cosine weighted random ray direction is generated:



From the above equation, the random variable ϕ is directly corresponding to the random ray direction on the tangent plane, which have a linear relationship between the angle ϕ and random variable ξ2. Since we generate random numbers using wang hash, which is a white noise. May be we can stratified the random range and using the blue noise to pick a desired range to turn it like a blue noise pattern. For example, originally we have a random number between [0, 1), we may stratified it into 4 ranges: [0, 0.25), [0.25, 0.5), [0.5, 0.75), [0.75, 1). Then using the screen space pixel coordinates to sample a tileable blue noise texture. And according to the value of the blue noise, we scale the white noise random number into 1 of the 4 stratified range. Below is some sample code of how the stratification is done:
int BLUE_NOISE_TEX_SIZE= 64;
int STRATIFIED_SIZE= 16;
float4 noise= blueNoiseTex[pxPos % BLUE_NOISE_TEX_SIZE];
uint2 noise_quantized= noise.xy * (255.0 * STRATIFIED_SIZE / 256.0);
float2 r= wang_hash(pxPos);  // random white noise in range [0, 1)
r = mad(r, 1.0/STRATIFIED_SIZE, noise_quantized * (1.0/STRATIFIED_SIZE));
With the blue noise adjusted ray direction, the ray traced AO image looks less noisy visually:

Rays are traced using white noise
Rays are traced using blue noise
Blurred white noise AO image
Blurred blue noise AO image

Ray Binning
In the demo, ray binning is also implemented, but the performance improvement is not significant. The ray binning only show a large performance gain when the ray tracing distance is large (e.g. > 10m) and turning off both half resolution and interleaved rendering. I have only ran the demo on my GTX 1060, may be the situation will be different on RTX graphcis card (so, this is something I need to investigate in the future). Also the demo may have a slight difference when toggling on/off ray binning due to the precision difference using RGBA16Float format to store ray direction (the difference will be vanished after accumulating more samples over multiple frames with temporal filter).

Conclusion
In this post, I have described how DXR is used to compute ray-traced AO in real-time using a combination of temporal and spatial filter. Those filters are important to increase the total sample count for Monte Carlo Integration and getting a noise free and stable image. The demo can be downloaded from here. There are still plenty of stuff to improve, such as having a better filter, currently, when the AO distance is large and both half resolution and interleaved rendering is turned on (i.e. 1 ray per 16 pixels), the image is too noisy and not temporally stable during camera movement. May be I need to improve those stuff when writing a path tracer in the future.

References
[1] DirectX Raytracing (DXR) Functional Spec https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html
[2] Edge-Avoiding À-Trous Wavelet Transform for fast GlobalIllumination Filtering https://jo.dreggn.org/home/2010_atrous.pdf
[3] Free blue noise textures http://momentsingraphics.de/BlueNoise.html
[4] Quick And Easy GPU Random Numbers In D3D11 http://www.reedbeta.com/blog/quick-and-easy-gpu-random-numbers-in-d3d11/
[5] Leveraging Real-Time Ray Tracing to build a Hybrid Game Engine http://advances.realtimerendering.com/s2019/Benyoub-DXR%20Ray%20tracing-%20SIGGRAPH2019-final.pdf
[6] ”It Just Works”: Ray-Traced Reflections in 'Battlefield V' https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s91023-it-just-works-ray-traced-reflections-in-battlefield-v.pdf