HDR Display

Introduction
Continue with the DXR Path Tracer in the last post, I updated the demo to support HDR display. It also has various small features updated such as adding per monitor high DPI support, path trace resolution scale (e.g. path trace at 1080, and bilinear upscale to 4K.) and added dithering to tone mapped output to reduce color banding (Integer back buffer format only). The updated demo can be downloaded here.
A photo of my HDR TV to show the difference between SDR and HDR

HDR Color Spaces
There are 2 swapchain formats/color spaces can be chosen to output HDR images on HDR capable monitor/TV:
1. Rec 2020 color space (DXGI_FORMAT_R10G10B10A2_UNORM + DXGI_COLOR_SPACE_RGB_FULL_G2084_NONE_P2020)
2. scRGB color space (DXGI_FORMAT_R16G16B16A16_FLOAT + DXGI_COLOR_SPACE_RGB_FULL_G10_NONE_P709)
Rec2020 color space is the common HDR10 format with PQ EOTF. But it is recommended to use scRGB color space on Windows. scRGB use the same color primaries at Rec709, which support wide color using negative values. I was confused about using negative values to represent a color (as well as intensity) at first. Because color gamut is usually displayed as CIE xy chromaticity diagram, I used to think of a given RGB value as a color interpolated inside the Red/Green/Blue gamut triangle using the barycentric coordinates
A debug chromaticity diagram showing Rec2020 color gamut.
Using a HDR backbuffer will show less clipped color
Although, it makes sense to represent a wide color using negative numbers using barycentric interpolation, I was confused how it to represent the intensity at the same time. It is because the chromaticity diagram skipped the luminance information. So, instead of thinking color inside the horseshoe-shaped diagram, it is easier for me to think about the color in 3D XYZ color space. The Red/Green/Blue color primaries of a gamut is 3 basis vectors in the XYZ color space. A linear combination of RGB values with the color primaries basis vectors can represent a color and intensity. Thinking in this way make me feel more comfortable. So the path traced lighting values are fed into ACES HDR tone mapper, transformed into Rec709 color space, and then divided by 80 when using scRGB color space(scRGB requires value of 1 to represent 80 nit).

HDR10 metadata
I have also played around with HDR10 metadata to see how it affects the image. But most of the data does not affect the image on my Samsung UA49MU7300JXKZ TV. The only data have effects on the image is "Max Mastering Luminance" which can affect the image brightness. Setting it to a small value will make image darker despite outputting a very bright 1000 nit image. Also, the HDR10 metadata only works in Full Screen Exclusive mode, or borderless windowed mode with HWND_TOPMOST Z order (I guess is the full screen optimization get enabled), using borderless window with HWND_TOP Z order won't work (but this mode is easier to alt-tab on single monitor set up...). Besides, entering Full Screen Exclusive mode may fail when calling SetFullscreenState() if the display is not connected to the adapter that used for rendering. I didn't notice this until I started to work on laptop which use the RTX graphics card for ray tracing and the laptop monitor is connected to the Intel graphics card. Looks like some hard work need be done in order to support Full Screen Exclusive mode properly (e.g. create a D3D Device/Command Queue/Swapchain and for the Intel graphics card and copy the ray traced image from the RTX graphics to Intel graphics card for full screen swapchain). But unfortunately, my demo does not support multi-adapter, so the HDR10 metadata may not work on such setup (I am outputting to the external HDR TV using the RTX graphics card, so that doesn't create a much problem for me...)...

Capability of my HDR TV
UI showing all the adjustable HDR10 meta in the demo

UI
Blending SDR UI with HDR image is handled in 2 steps: blend color and then brightness. All the UI is rendered into an off screen buffer (in 8 bit Rec709 color space). And then later blended with the ACES tone mapped image. Take a look at the ACES tone mapping function snippet below, the lighting value will be mapped to a normalized range in AP1 color space as an intermediate step (red part in the below code snippet). So in the demo, the UI in off screen buffer will be converted to AP1 color space and then blend with the tone mapped image at this step. (I have also tried blending in XYZ space and the result is similar in the demo.)

ACES tone map function snippet

Then the UI blended image can be transformed into target color space (e.g. scRGB or Rec2020, purple part in the above code snippet). When converting the normalized color data to HDR data, the ACES tone mapper interpolates the RGB values between Y_MIN and Y_MAX (i.e. blue part in the above code snippet). During this brightness interpolation, the demo adjust Y_MAX (e.g. 1000 to 4000 nits) to user defined UI brightness (e.g. 80 to 300 nits) depending on the UI alpha, using the following formula:



In the demo, BlendPow is set to 5 as default value. Although the result is not perfect (e.g. the UI alpha value may not looks blending linearly, depending on background luminance), it works well enough to avoid bleeding though from a bright background with UI alpha > 0.5:

Background Luminance: 0-10 nit
Background Luminance: 2 - 250 nit
Background Luminance: 1000 nit
Photos showing UI blending with HDR background, from dark to bright.

However, the above blending formula have artifact when Y_MAX is smaller than the UI brightness (But this may not happens and only happens in some debug view mode). In this case, the background may looks too bright after blending with the UI. To fix this, inverting the BlendPow may helps to minimize the artifact:



The below is an extreme example showing the artifact with Y_MAX set to 1 nit and UI set to 80 nit:
Without the fix, the artifact is very noticeable at the upper right corner
With the fix, the background is darkened properly
Looking back to the blue part of the ACES tone mapping function in the above code snippet, I was wondering will the image looks different as the Y_MIN and Y_MAX values are interpolated in the display color space (e.g. will the image looks different when interpolating in scRGB / Rec2020). To compare whether these 2 color are the same, we need to have both color in the same space. So let's convert the interpolated RGB value back to XYZ space:



So, interpolating in different display color space do have some differences as long as Y_MIN != 0. But The ACES tone mapper which defaults to use the STRECH_BLACK option which effectively setting Y_MIN= 0, so there should be no difference when interpolating values in difference color space. Out of curiosity, I have tried to disable the STRECH_BLACK option to see whether the image will look different when switching between the scRGB and Rec2020 back buffer with Y_MIN > 0, but the images still looks the same with a large Y_MIN... I am not sure why this happens, may be the difference is too small to be noticeable... In the demo, I take this into account and treat the Y_MIN as a value in the XYZ color space and interpolate the value like this:



Debug View Mode
To help debugging, several debug view modes are added. A "Luminance Range" view mode is used to display the luminance value of a pixel before tone mapping (ie. the physical light luminance entering the virtual camera) and after tone mapping (i.e. the output light luminance the monitor should be displayed):
Showing pixel luminance before tone mapping.
Showing pixel luminance after tone mapping.

A "Gamut Clip Test" mode to high light pixel that fall outside Rec709 gamut, i.e. those color can only be viewed with a HDR display or wide color monitor (e.g. AdobeRGB / P3 monitor).
Highlight clipped pixel with cyan color
A "SDR HDR Split" mode to compare the SDR/HDR images. But this mode can only be a rough preview of how SDR will look like on HDR back buffer. It is because SDR images are usually displayed brighter than 80 nit, the SDR part need to be brighten up (say to 100-200 nit) to make it look similar to using a real SDR back buffer. Also, in this view mode, because the HDR back buffer expect a pixel value in nit (either linear or PQ encoded), I don't apply any gamma curve to the SDR portion of the image, which may also result in differences to a real SDR version too.
A photo showing both SDR and HDR at the same time.
Bloom is exaggerated to show the clipped color in SDR
Conclusion
In this post, I have talked about the color spaces used for outputting to HDR display. No matter which color space format is used (e.g. scRGB/ Rec2020), the displayed image should be identical if transformed correctly (except some precision difference). Also, I have tried to play around with the HDR10 metadata, but most of the metadata does not change the image on my TV... I guess how the metadata is interpreted is device dependent. Lastly, SDR UI is composited with the HDR image by first blending with the color and then brightness. A simple blending formula is enough for the demo. A complicated algorithm can be explored in the future, say brighten up the UI depending on the brightness of a blurred HDR background (e.g. may be storing the background luminance in the alpha channel and blur it together with bloom pass?). A demo can be downloaded here to test on your HDR display (Some of the options are hid if not connecting to HDR display).

References
[1] https://channel9.msdn.com/Events/Build/2017/P4061
[2]  https://www.pyromuffin.com/2018/07/how-to-render-to-hdr-displays-on.html
[3] https://www.gdcvault.com/play/1024803/Advances-in-the-HDR-Ecosystem
[4] https://www.gdcvault.com/play/1026443/Not-So-Little-Light-Bringing
[5] https://developer.nvidia.com/implementing-hdr-rise-tomb-raider
[6] https://www.asawicki.info/news_1703_programming_hdr_monitor_support_in_direct3d
[7] https://onedrive.live.com/?authkey=%21AFU3moSbUzyUgaE&cid=A4B88088C01D9E9A&id=A4B88088C01D9E9A%21170&parId=A4B88088C01D9E9A%21106&o=OneUp
[8] https://www.shadertoy.com/view/4tV3WW

DXR Path Tracer

Introduction
Can't believe it has been half a year since my last DXR AO post. It was a hard time in Hong Kong last year, but thanks to the social unrest and medical worker's strike, the Wuhan Coronavirus does not spread widely in the local community (but still have new cases everyday...). Due to the virus, it is better to stay at home, so I continue to code my path tracer. This new path tracer is unbiased by terminating rays with Russian Roulette. During path tracing, physical light unit are used. Also, rendering can be done in wide color space. Finally, the path traced result is tone mapped and output to sRGB/wide color gamut depends on the display device. A demo can be downloaded here (Please use latest graphics driver to run as I have encountered device remove hang on my laptop RTX2060 with old driver, but not on my desktop GTX1060... If the crash/hang still happens, please let me know. Thank you.).

Path Traced Sponza scene.


Render Loop
At the start of the demo (or after any camera movement/lighting changes), a structured buffer, Ray Buffer, is initialized with 1 ray per pixel using the camera transform.
The struct stored in Ray Buffer, not tightly packed for easier understanding.
Then a ray generation shader is dispatched to read the Ray Buffer and trace rays into the scene. Lighting is calculated and generate new Ray Buffer elements if the rays are not terminated and continue the path tracing in the next frame. Below is a simplified flow of the rendering operation executed every frame (with render passes on the left and resources on the right):

A simplified path tracing flow executed every frame
Let's start with the usage of resources in the above flow chart first:
  • Ray Buffers are structured buffers storing the RayData struct. A ray will be traced for each element and if the ray is not terminated by Russian Roulette, it will be stored back to the Ray Buffer for the next frame.
  • Lighting Path Texture is used for accumulating the lighting result when a ray is traversing along the path from the camera. It can be think of an intermediate result because the path is not fully traversed within a single frame, but across several frames.
  • Progress Buffer is a 8 bytes buffer, with 4 bytes storing the current path depth and other 4 bytes storing the total accumulated sample count.
  • Lighting Sample Texture is used for accumulating the lighting result of all the terminated rays (i.e. accumulating all the terminated ray result from Lighting Path Texture).
About operation done in each render pass:
  1. Ray Tracing Pass dispatch a ray generation shader, sampling the Ray Buffer according to the DispatchRaysIndex() and then calling TraceRay() to calculate the lighting result inside the closest hit shader by randomly choosing diffuse or specular lighting (another shadow ray is traced towards light source during lighting calculation). The lighting result is added to Lighting Path Texture and non-terminated ray will be stored back into Ray Buffer for next frame.
  2. Checking whether all rays are terminated by using D3D12 predicate on the counter buffer of Ray Buffer (i.e. all rays terminated when counter == 0). Then different shader/operation will be executed depends on whether the Ray Buffer is empty.
  3. When there are still rays not terminated, increase the path depth in Progress Buffer.
  4. When all rays are terminated, increase the sample count and set path depth to 0 in Progress Buffer.
  5. Accumulate the current path lighting result in Lighting Path Texture to Lighting Sample Texture. Clear the Lighting Path Texture to 0 (it is cleared via compute shader instead of command list clear as the predicate does not work on the clear API, despite the spec say it would...)
  6. Regenerate the rays in Ray Buffer with 1 ray per pixel using the camera transform for path tracing new lighting samples in next few frames. 
  7. Display the current lighting result to the back buffer.
Path traced images

With the core operations described above, at most 2 rays per pixel can be launched to maintain an interactive frame rate on my GTX1060. On more powerful machines (i.e. RTX cards with hardware accelerated ray tracing), step 1 don't need to be terminated with the first closest hit, but bounce a few more times before storing back to the Ray Buffer (A "#Bounce/Frame" option is added to increase the number of bounce per frame for RTX cards).
Number of bounce per frame option to adjust performance
The current approach described above has 2 drawbacks: First, we don't know how many rays are still left in Ray Buffer on CPU, DispatchRay() is called with the maximum number of rays (i.e. viewport width * height), and terminate early within the ray generation shader. This can be fixed in DXR Tier 1.1 using ExecuteIndirect() in the future. The second drawback is the performance is not constant across several frames, because the number of rays need to be traced decrease every frame and then reset back, so the frame rate fluctuate.

ACES tone mapping
After calculating the HDR lighting value, we need to perform a tone map pass to map the lighting value to a displayable range. ACES tone mapping is chosen due to its popularity in recent years. ACES has a few tone mapping curves (they call it RRT + ODT) for different display with different color gamut and viewing condition. Some common display types are sRGB_100nit and Rec709_100nit. The input of RRT+ODT function expect RGB values in ACES2065-1 (AP0) gamut with a white point around(but not exact) D60. So we need to convert our lighting value (L_sRGB) to AP0 gamut by multiplying a few transformation matrices:
operation to convert RGB value from sRGB to AP0
The above steps means first transforming sRGB values to XYZ color space with D65 white point (gamut transformation matrices can be calculated using the formula from here), then apply a Chromatic Adaption Transform(CAT) due to different white point between sRGB and AP0 (the matrix can be calculated using the formula from here). Finally, the XYZ value can be transformed to AP0 gamut. All these matrices can be combined to perform 1 matrix-vector multiplication as an optimization. Then this value can be feed into ACES RRT+ODT to compute the back buffer value for display.

So we just only need to select the appropriate ODT for the target display device. But unfortunately, not all common display gamut is provided, like my recently bought RTX laptop which comes with a 100% AdobeRGB color gamut monitor. ACES does not provide a suitable ODT to display the image in AdobeRGB color space. If using the common sRGB ODT, image will look too saturated. So I added a "Remap display color gamut" option in the demo:

Remapping option to display the path traced result according to display color primaies
The Remap display color gamut option performs the following steps on the output of RRT+ODT:
  1. Apply EOTF function to the ODT output to get linear lighting value.
  2. Transform the resulting RGB value in step 1 to the target display color gamut RGB value (e.g. AdobeRGB gamut on my laptop display), with Chromatic Adaption Transformation applied.
  3. Apply OETF function to the output of step 2 for display.
By doing the above remapping, I can get similar result between my AdobeRGB laptop monitor and sRGB desktop monitor. But 1 drawback is that, although we can query the color primaries of the display, but it is not always accurate. For example, on my laptop, I can switch to regular sRGB view mode, but the IDXGIOutput6::GetDesc1() is still returning the AdobeRGB color primaries. I have also tried on some other monitors, they have color primaries greater than sRGB, but not exactly AdobeRGB or P3 primaries, and they also have different view mode such as AdobeRGB or sRGB. So I just leave the gamut remapping function optional in the demo and the user can choose their remap color primaries.

Also digging deeper in the ACES ODT source code, the 3 ODT used in the demo share many common code and only have different color space transform function / OETF at the end of ODT. In the future, I may refactor the RRT+ODT code and remove the remap display gamut function and directly transforn the ACES ODT output in XYZ space to the display gamut queried by IDXGIOutput6::GetDesc1() (or user selected gamut).
An ODT from ACES, the blue part is the same for all the 3 ODT used in the demo.
The orange part is different depends on display, which can be replaced by display
primaries returned from IDXGIOutput6::GetDesc1(), so the "Remap display color
gamut" in the demo can be removed in the future. 

WCG rendering
Equipped with the knowledge of transforming between color spaces, I decide to try rendering in Wide Color Gamut instead. Games like GT Sport rendered in wide color already(Rec2020). Performing lighting calculation in wide color gamut can result in more accurate lighting than rendering in sRGB color space (despite displaying on sRGB monitor).

Path Traced result rendered in different color space. Left:sRGB, Center:ACEScg, Right:Rec2020

In the demo, it can path trace in sRGB, ACEScg, Rec2020 color space. Inside the closest hit shader, albedo texture is read and transformed into the chosen rendering color space from sRGB. Also the light color is converted to chosen rendering color space and then multiply with the intensity. Finally inside the tone mapping pass, the result of lighting calculation is transformed to AP0 color space and feed into ACES RRT+ODT for display. You may notice some difference between rendering in sRGB and wide color gamut (e.g. ACEScg and Rec2020). If you have a wide color gamut monitor (e.g. AdobeRGB or DCI-P3), you can try to use the Rec2020 ODT with the "Remap display color gamut" option on (described in last section). This can produce fewer color clamping and display more saturated color. But under normal lighting condition, the difference is not that much, we need to set up specific lighting such as using a local sphere light with saturated color, then those wide color can be displayed. I guess this is due to both albedo texture and light color are in sRGB space, content may need to be adjusted in order to take advantage of wide color display.

Wide Color path traced image, saved with different profile. Left:saved with sRGB profile Right: saved with AdobeRGB profile. 
The right image shows more saturated color when viewed on a color managed browser with a wide color display (e.g. iPhone monitor), 
otherwise 2 images may look the same.

Also, please note that this kind of wide color support is different from Windows 10 HDR/WCG settings. On my laptop, Windows report No for both HDR and WCG, but it do have an AdobeRGB monitor and capable of displaying wide color, we just need to correctly transform the images using the monitor color gamut.
My laptop has an AdobeRGB monitor, but Windows 10 Display capabilities report No for WCG.

ACES ODT blue light artifact
So far everything looks good when rendering in wide color space. Color get desaturated when they are over exposed. But it still has some issue when using a strong blue light...

Using a strong sRGB blue light will introduce hue shift...

It is because pure blue (0, 0, 255) in sRGB space is not saturated enough when transformed to wide color gamut (e.g. ACEScg/Rec2020). Looking inside the ACES dev repo, it has a blue light artifact fix LMT to fix this issue. It works by de-saturating the blue color a bit to lessen the hue shift. So in the demo, I provided a "Blue Correction" parameter to adjust the blue de-saturation strength (As a side note, UE4 also use ACES tone mapper and comes with a blue correction parameter in post process setting).
desaturating blue color to fix the hue shift

But I do like the saturated blue color, using the blue light artifact fix LMT will de-saturate the blue color made me sad. Below is the comparison between with/without the blue light LMT:
Left: without blue light fix LMT, Right: with blue light fix LMT

So may be we can work around the problem in the other way. Instead of making the blue color less saturated, we can make the light color more saturated. So I added a light "Color Picker Space" combo box to specify the color space of the picked RGB light value, so that more saturated blue light color can be chosen. By choosing an extremely saturated blue color (0, 0, 255) RGB value in ACEScg color space. We can get away with the purple color:
Using a saturated blue light in ACEScg space, without the blue light fix LMT

Bloom
Lastly, a bloom pass is added before tone mapping. Bloom pixels are extracted based on a threshold that exceeded the maximum luminance with the current exposure values. The maximum luminance is calculated with:
max luminance calculated using EV100
But simply subtracting the lighting value with the threshold will introduce some hue shift to the bloom color. So the RGB lighting value is transformed to HSV space, subtract the threshold from V, and then transform back to RGB space (We keep all the RGB values in the rendering space without transforming the lighting value to sRGB from ACEScg/Rec2020 during HSV conversion, as there are not much difference between the bloom results). Given an image with HDR values:
Input image for the bloom pass
The differences between using threshold in HSV space and RGB space:
Left column: bloom in HSV space. Right column: bloom in RGB space.
Upper row: Lighted scene combined with bloom.
Lower row: Debug images showing only the bloom component.

The bloom calculated using HSV space introduce less saturated color. The situation will be exaggerated when the image is over-exposed:
Left:Bloom input image.
Center:Bloom in HSV space.
Right: Bloom in RGB space.
Conclusion
In this post, the core algorithm of my DXR path tracer is described, together with some color space conversion. There are much more stuff to be done in the future like, support dynamic geometry during ray tracing, adding a denoiser for path traced output, implement hybrid rasterization/ray tracing rendering, spectral rendering to compute a ground truth reference. Also, this is my first time to write code about color space management. Currently, in the demo, the 3D lighting can be displayed correctly using the monitor gamut, but the UI is not managed properly. Also, 4K and HDR need to be supported too.

References
[1] https://seblagarde.files.wordpress.com/2015/07/course_notes_moving_frostbite_to_pbr_v32.pdf
[2] https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#addressing-calculations-within-shader-tables
[2] https://github.com/ampas/aces-dev
[3] http://www.brucelindbloom.com/index.html?Eqn_RGB_XYZ_Matrix.html
[4] http://www.brucelindbloom.com/index.html?Eqn_ChromAdapt.html
[5] http://www.polyphony.co.jp/publications/sa2018/









Note on sampling GGX Distribution of Visible Normals

Introduction
After writing an AO demo in last post, I started to write a progressive path tracer, but my progress was very slow due to the social unrest in past few months (here are some related news about what has happened). In the past weeks, the situation has claimed down a bit, and I continue to write my path tracer and started adding specular lighting. While implementing Eric Heitz's "Sampling the GGX Distribution of Visible Normals" technique, I was confused by why taking a random sample on a disk and then project it on the hemisphere equals to the GGX distribution of visible normals (VNDF). And I can't find a prove in the paper, so in this post, I will try to verify their PDF are equal. (Originally, I planned to write this post after finishing my path tracer demo. But I worry that the situation here in Hong Kong will get worse again and won't be able to write, so I decided to write it down first, hope it won't get too boring with only math equations.)
My work in progress path tracer, using GGX material only

Quick summary of sampling the GGX VNDF technique
For those who are not familiar with the GGX VNDF technique, I will briefly talk about it. It is an important sampling technique to sample a random normal vector from GGX distribution. That normal vector is then used for generating a reflection vector, usually for the next reflected ray during path tracing.

Traditional importance sampling scheme use D(N) to sample a normal vector
VNDF technique use the visible normal to importance sample a vector, taking the view direction into account
Given a view vector to a GGX surface with arbitrary roughness, the steps to sample a normal vector are:
  1. Transform the view vector to GGX hemisphere configuration space (i.e. from arbitrary roughness to roughness = 1 config) using GGX stretch-invariant property.
  2. Sample a random point on the projected disk along the transformed view direction.
  3. Re-project the sampled point onto the hemisphere along view direction. And this will be our desired normal vector.
  4. Transform the normal vector back to original GGX roughness space from the hemisphere configuration.
VNDF sampling technique illustration from  Eric Heitz's paper
My confusion mainly comes from step 2 and 3, in the hemisphere configuration: why this method of generating normal vector equals to GGX VNDF exactly...

GGX NDF definition
Before digging deep into the problem, let's start with the definition of GGX NDF. In the paper, it states that: The GGX distribution uses only the upper part of the ellipsoid, and when alpha/roughness equals to 1, the GGX distribution is a uniform hemisphere. According to the definition (with alpha = 1):



So its PDF will be:

So, sampling a normal vector from GGX distribution (with alpha = 1) equals to sampling a vector using a cos-weighted distribution.

GGX VNDF definition
The definition of VNDF depends on the shadowing function. And we are using the Smith shadowing function (with alpha =1):


Therefore the VNDF equals to:



GGX VNDF specific case
With both GGX NDF and VNDF definition, we can start investigating the problem. I decided to start with something simple first, with a specific case: view direction equals to surface normal (i.e. V=Z).



After simplification in this V=Z case, the PDF of Dz(N) is also cos-weighted, which equals to the traditional sampling GGX NDF method.

Now take a look at the sampling scheme by Eric Heitz's method. The method start with uniform sampling from a unit disc, which has a PDF = 1/π, then the point is projected to the hemisphere along the view direction, which add a cos term to the PDF (i.e. Z.N/π ) according to Malley's method (where the cos term comes from the Jacobian transform). Therefore, both the VNDF and Eric Heitz's method are the same at this specific case, which has a cos weighted PDF.

GGX VNDF general case
To verify Eric Heitz's sampling scheme equals to the PDF of GGX VNDF in all possible viewing direction, we need to calculate the PDF of his method and take care of how the PDF changes according to each transformation. From the paper we have this vertical mapping:
Transformation of randomly sampled point from Eric Heitz's paper
We know the PDF of sampling an unit disk is 1/π, (i.e. P(t1, t2)= 1/π), we need to calculate P(t1, t2'):



The next step of the algorithm is to re-project the disc to the hemisphere along the view direction, which produce our target importance sampled normal, so by Malley's method again (but this time along the view direction instead of surface normal), we can add a V.N Jacobian term to the above PDF P(t1,t2'):



The resulting PDF equals to the GGX VNDF definition exactly. So this solved my question of why Eric Heitz's sampling scheme is an exact sampling routine for the GGX VNDF.

Conclusion
This post describe my learning process of the paper "Sampling the GGX Distribution of Visible Normals" and solved my most confusing part of why "taking a random sample on a disk and then project it on the hemisphere equals to the GGX VNDF". If anybody knows a simpler proof of how these 2 equations are equal, or if you discover any mistake, please let me know in the comment. Thank you.

References
[1] http://www.jcgt.org/published/0007/04/01/paper.pdf
[2] https://hal.archives-ouvertes.fr/hal-01509746/document
[3] https://agraphicsguy.wordpress.com/2015/11/01/sampling-microfacet-brdf/
[4] https://schuttejoe.github.io/post/ggximportancesamplingpart1/
[5] https://schuttejoe.github.io/post/ggximportancesamplingpart2/
[6] http://www.pbr-book.org/3ed-2018/Monte_Carlo_Integration/2D_Sampling_with_Multidimensional_Transformations.html