Simon's Tech Blog: Maths

顯示包含「Maths」標籤的文章。顯示所有文章

Importance sampling visible wavelength

Introduction

It has been half a year since my last post. Due to the pandemic and political environment in Hong Kong, I don't have much time/mood to work on my hobby path tracer... And until recently, I tried to get back to this hobby, may be it is better to work on some small tasks first. One thing I am not satisfied in previous spectral path tracing post is using 3 different cosh curves(with peak at the XYZ color matching functions) to importance sample the visible wavelength instead of 1. So I decided to revise it and find another PDF to take random wavelength samples. A demo with the updated importance sampled wavelength can be downloaded here and the python code used for generating the PDF can be viewed here (inspired by Bart Wronski's tweet to use Colab).

First Failed Try

The basic idea is to create a function with peak values at the same location as the color matching functions. I decided to use the sum of the XYZ color matching curves as PDF.

Black line is the Sum of XYZ curves

To simplify calculation, analytical approximation of the XYZ curves can be used. A common approximation can be found here, which seems hard to be integrated(due to the λ square) to create the CDF. So a rational function is used instead:

The X curve is approximated with 1 peak and dropped the small peak at around 450nm, because we want to compute the "sum of XYZ" curve, that missing peak can be compensated by scaling the Z curves. The approximated curves looks like:

Light colored curves are the approximated function

Grey color curve is the approximated PDF

The approximated PDF is roughly similar to the sum of XYZ color matching curves. But I have made a mistake: Although the rational function can be integrated to create the CDF, I don't know how to compute the inverse of CDF(which is needed by the inverse method to draw random samples using uniform random numbers). So I need to find another way...

Second Try

From previous section, although I don't know how to find the inverse of the approximated CDF, out of curiosity, I still want to know how the approximated CDF looks like, so I plot the graph:


Black line is the original CDF Grey line is the approximated CDF

It looks like some smoothstep functions added together, with 1 base smoothstep curve in range [380, 780] with 2 smaller smoothstep curves (in range around [400, 500] and [500, 650]) added on top of the base curve. May be I can approximate this CDF with some kind of polynomial function. After some trail and error, an approximated CDF is found (details of CDF and PDF can be found in the python code):

The above function divides the visible wavelength spectrum into 4 intervals to form a piecewise function. Since smoothstep is a cubic function, adding another smoothstep function is still another cubic function, which can be inverted and differentiated. And the approximated "smoothstep CDF/PDF" curve looks like:

Blue line is the approximated "smoothstep CDF"
Black line is the original CDF

Blue line is the approximated "smoothstep PDF"
Black line is the original PDF

Although the "smoothstep CDF" looks smooth, its PDF is not (not C1 continuous at around 500nm). But it has 2 peaks value at around 450nm and 600nm, may be let's try to render some images to see how it behaves.

Result

The same Sponza scene is rendered with 3 different wavelength sampling functions: uniform, cosh and smoothstep (with stratification of random number disabled, color noise are more noticeable when zoomed in):

uniform weighted sampling

cosh weighted sampling

smoothstep weighted sampling

Both cosh and smoothstep wavelength sampling method show less color noise than uniform sampling method, with the smoothstep PDF slightly better than cosh function. Seems the C1 discontinuity of the PDF does not affect rendering very much. A demo can be downloaded here to see how it looks in real-time.

Conclusion

This post described an approximated function to importance sample the visible wavelength by using the sum of the color matching functions, which reduce the color noise slightly. The approximated CDF is composed of cubic piecewise functions. The python code used for generating the polynomial coefficients can be found here (with some unused testing code too. e.g. I have tried to use 1 linear base function with 2 smaller smoothstep functions added on top, but the result is not much better...). Although the approximated PDF is not C1 continuous, it does not affect the rendering very much. If someone knows more about how the C1 continuity of the PDF affect rendering, please leave a comment below. Thank you.

References

[1] https://en.wikipedia.org/wiki/CIE_1931_color_space#Color_matching_functions

[2] Simple Analytic Approximations to the CIE XYZColor Matching Functions

[3] https://stackoverflow.com/questions/13328676/c-solving-cubic-equations

Note on sampling GGX Distribution of Visible Normals

Introduction

After writing an AO demo in last post, I started to write a progressive path tracer, but my progress was very slow due to the social unrest in past few months (here are some related news about what has happened). In the past weeks, the situation has claimed down a bit, and I continue to write my path tracer and started adding specular lighting. While implementing Eric Heitz's "Sampling the GGX Distribution of Visible Normals" technique, I was confused by why taking a random sample on a disk and then project it on the hemisphere equals to the GGX distribution of visible normals (VNDF). And I can't find a prove in the paper, so in this post, I will try to verify their PDF are equal. (Originally, I planned to write this post after finishing my path tracer demo. But I worry that the situation here in Hong Kong will get worse again and won't be able to write, so I decided to write it down first, hope it won't get too boring with only math equations.)

My work in progress path tracer, using GGX material only

Quick summary of sampling the GGX VNDF technique
For those who are not familiar with the GGX VNDF technique, I will briefly talk about it. It is an important sampling technique to sample a random normal vector from GGX distribution. That normal vector is then used for generating a reflection vector, usually for the next reflected ray during path tracing.

Traditional importance sampling scheme use D(N) to sample a normal vector

VNDF technique use the visible normal to importance sample a vector, taking the view direction into account

Given a view vector to a GGX surface with arbitrary roughness, the steps to sample a normal vector are:

Transform the view vector to GGX hemisphere configuration space (i.e. from arbitrary roughness to roughness = 1 config) using GGX stretch-invariant property.
Sample a random point on the projected disk along the transformed view direction.
Re-project the sampled point onto the hemisphere along view direction. And this will be our desired normal vector.
Transform the normal vector back to original GGX roughness space from the hemisphere configuration.

VNDF sampling technique illustration from Eric Heitz's paper

My confusion mainly comes from step 2 and 3, in the hemisphere configuration: why this method of generating normal vector equals to GGX VNDF exactly...

GGX NDF definition

Before digging deep into the problem, let's start with the definition of GGX NDF. In the paper, it states that: The GGX distribution uses only the upper part of the ellipsoid, and when alpha/roughness equals to 1, the GGX distribution is a uniform hemisphere. According to the definition (with alpha = 1):

So its PDF will be:

So, sampling a normal vector from GGX distribution (with alpha = 1) equals to sampling a vector using a cos-weighted distribution.

GGX VNDF definition
The definition of VNDF depends on the shadowing function. And we are using the Smith shadowing function (with alpha =1):

Therefore the VNDF equals to:

GGX VNDF specific case

With both GGX NDF and VNDF definition, we can start investigating the problem. I decided to start with something simple first, with a specific case: view direction equals to surface normal (i.e. V=Z).

After simplification in this V=Z case, the PDF of Dz(N) is also cos-weighted, which equals to the traditional sampling GGX NDF method.

Now take a look at the sampling scheme by Eric Heitz's method. The method start with uniform sampling from a unit disc, which has a PDF = 1/π, then the point is projected to the hemisphere along the view direction, which add a cos term to the PDF (i.e. Z.N/π ) according to Malley's method (where the cos term comes from the Jacobian transform). Therefore, both the VNDF and Eric Heitz's method are the same at this specific case, which has a cos weighted PDF.

GGX VNDF general case

To verify Eric Heitz's sampling scheme equals to the PDF of GGX VNDF in all possible viewing direction, we need to calculate the PDF of his method and take care of how the PDF changes according to each transformation. From the paper we have this vertical mapping:

Transformation of randomly sampled point from Eric Heitz's paper

We know the PDF of sampling an unit disk is 1/π, (i.e. P(t1, t2)= 1/π), we need to calculate P(t1, t2'):

(Edited on 28/11/2022: Thanks for Brian Collins pointing out, dt2'/dt1 was calculated incorrectly before)

The next step of the algorithm is to re-project the disc to the hemisphere along the view direction, which produce our target importance sampled normal, so by Malley's method again (but this time along the view direction instead of surface normal), we can add a V.N Jacobian term to the above PDF P(t1,t2'):

The resulting PDF equals to the GGX VNDF definition exactly. So this solved my question of why Eric Heitz's sampling scheme is an exact sampling routine for the GGX VNDF.

Conclusion

This post describe my learning process of the paper "Sampling the GGX Distribution of Visible Normals" and solved my most confusing part of why "taking a random sample on a disk and then project it on the hemisphere equals to the GGX VNDF". If anybody knows a simpler proof of how these 2 equations are equal, or if you discover any mistake, please let me know in the comment. Thank you.

References
[1] http://www.jcgt.org/published/0007/04/01/paper.pdf
[2] https://hal.archives-ouvertes.fr/hal-01509746/document
[3] https://agraphicsguy.wordpress.com/2015/11/01/sampling-microfacet-brdf/
[4] https://schuttejoe.github.io/post/ggximportancesamplingpart1/
[5] https://schuttejoe.github.io/post/ggximportancesamplingpart2/
[6] http://www.pbr-book.org/3ed-2018/Monte_Carlo_Integration/2D_Sampling_with_Multidimensional_Transformations.html

Transforming points in unit square with p-norm

Introduction
Continue with previous post talking about some little tricks I used when implementing the animation editor. This time, I will talk about another maths problem when implementing the directional blend node in the animation tree editor. In this node, it blends 5 different animations(i.e. forward, backward, left, right and center) according to the current direction:

blending 5 different animation according to the current direction

And the UI of this node has a square input region like this for artist to preview how those animations are blended together:

So, to compute a blended output, we need to transform the points in the square region to a rhombus, and then we can use barycentric coordinates to determine the blend weight for blending:

Transform the current direction for computing the blend weight

p-norm
After searching google for a while, something called Lp-space has shown up with the following figures which looks similar to what I need:

Showing "unit circle" with different p-norm

All of the above figures are showing a "unit circle" (i.e. the vectors are of unit-length), but measured with different p-norms. The definition of p-norm is:

This equation describe the length of a vector. When p=2, it is the Euclidean norm that we normally use. Our goal is to transform points inside the unit square to the rhombus, in other words, we want the transformed points having the same length as before the transformation but measured with different p-norm, using p=1 after the transformation and p=infinity before transformation. In other words, we have (assume only dealing with the 2D case, and only consider points in the first quadrant):

, where v=(x, y) is the original point and v'=(x', y') is the transformed point.

Mapping the points
As our target is to find a function to transform the points, having only the above equation is not enough as there are infinitely many way to transform the point that satisfy the above equation. So we need another equation to find a unique solution, the idea is to have the transformed point, original point and the origin are lying on the same straight line(i.e. having the same slope), which gives another equation:

By solving the above equations, which gives the transform functions:

Taking into account those points in other quadrant and special case, which gives the final mapping functions:

However, taking the limit p to infinity and compute it with Wolfram Alpha does not gives any useful result... So I just take p=10 (taking a too large value will result in floating point precision problem...). Here is the plot of how the points are transformed/squeezed into the rhombus:

Graph showing how points are transformed from square to rhombus

Although the function cannot transformed the points into a perfect rhombus, it does not have too much artifact when used for animation blending.

Conclusion
In this post, I have described how to map points inside a unit square to a "rhombus" which used for animation blending. But the problem is not completely solved as I am not taking the limit of p to infinity, and there are some power function in the transformation which is not fast... But this is enough for calculating the blend weight of the animations.

Reference
[1] Lp space: http://en.wikipedia.org/wiki/Lp_space

Checking whether a point lies on a Bezier curve

Introduction
It has been a long time since my last post. I was busy in the past months, and finally have time to finish dual-quaternion skinning and added some basic animation to my code base.

The animation editor

In the animation tree editor, those tree nodes can be connected/disconnected, which displayed by a cubic Bezier curve. I want the user can right click the curve to disconnect the tree nodes, so checking whether a point lies on a Bezier curve is need.

Sub-dividing the curve
A cubic Bezier is defined as:

solving an analytical solution for this is quite complicated. So I just sub-divide the curve into several line segments and stop the sub-division if the points are close to a straight line. After that we can check if the point is lying on the line segments instead. First we start at the 2 end points of Bezier curve and check whether the mid-point(the red cross) need to be sub-divided.

check whether the red point need to be sub-divided

We can't simply check whether this 3 points close to a straight line because there may have an inflection point on the curve, so we need to look ahead 1 step further to see whether the 2 line segments(the blue and green lines below) are a straight line, if not, then the red point need to be sub-divided.

look ahead one more point to avoid inflection point problem

Repeat these steps for each newly generated sub-divided curve segments until all the points are close to a straight line (with a threshold). The sub-divided curve may look like this:

Before sub-division

After sub-division

With this approach, when we zoom into the graph, more points will get sub-divided:

Zooming in will sub-divide more points

Conclusion
This short post describe a way for checking whether a point is on a cubic bezier curve by using line sub-division. In the next post, I will talk about another fun stuff when writing the animation editor.

Software Rasterizer Part 2

Introduction
Continue with the previous post, after filling the triangle with scan line or half-space algorithm, we also need to interpolate the vertex attributes across the triangle so that we can have texture coordinates or depth on every pixel. However we cannot directly interpolate those attributes in screen space because projection transform after perspective division is not an affine transformation (i.e. after transformation, the mid-point of the line segment is no longer the mid-point), this will result in some distortion and this artifact is even more noticeable when the triangle is large:

interpolate in screen space

perspective correct interpolation

Condition for linear interpolation
When interpolating the attributes in a linear way, we are saying that given a set of vertices, vi (where i is any integer>=0) with a set of attributes ai (such as texture coordinates),
we have a function mapping a vertex to the corresponding attributes, i.e.

f(vi)= ai

Say, to interpolate a vertex inside a triangle in a linear way, the function f need to have the following properties:

f(t0 *v0 + t1 *v1 + t2 *v2 ) = t0 * f(v0) + t1 * f(v1) + t2 * f(v2)

, for any t0, t1, t2 where t0 + t1 + t2=1

which means that we can calculate the interpolated attributes using the same weight ti used for interpolating vertex position. For functions having the above properties, those functions will be an affine function with the following form:

f(x)= Ax + b

, where A is a matrix, x and b are vector

Depth interpolation
When a vertex is projected from view space to normalized device coordinates(NDC), we will have the following relation (ratio of the triangles) between the view space and NDC space:

substitute equation 1 and 2 into the plane equation of the triangle lies on:

So, 1/zview is an affine function of xndc and yndc which can be interpolated linearly across the screen space (the transform from NDC space to screen space is a linear transform).

Attributes interpolation
In last section, we know how to interpolate the depth of a pixel linearly in screen space, the next problem is to interpolate the vertex attributes(e.g. texture coordinates). In view space, we know that those attributes can be interpolated linearly, so those attributes can be calculated by an affine function with the vertex position as parameters e.g.

Similar to interpolate depth, substitute equation 1 and 2 into the above equation:

Therefore, u/zview is an another affine function of xndc and yndc which can be interpolated linearly across the screen space. Hence we can interpolating u linearly by first interpolate 1/zview and u/zview across screen space, and then divide them per pixel.

The last problem...

Now, we know that we can interpolate the view space depth and vertex attributes linearly across screen space. But during the rasterization state, we only have vertices in homogenous coordinates (vertices are transformed by the projection matrix already), how can we get the zview to do the perspective correct interpolation?
Consider the projection matrix (I use D3D one, but the same for openGL):

After transforming the vertex position, the w-coordinate will be the view space depth!

i.e. whomogenous = zview

And look at the matrix again and consider the transformed z-coordinates, it will in a form of:

After transforming to the NDC,

So the depth value can be directly interpolated using zNDC for depth test.

Demo
A javascript demo to rasterize the triangles is provided(although not optimized...). And the source code can be downloaded here.

Render Type: Depth Only Texture Only Lighting Only Texture + Lighting	Model: Box Duck
Perspective Correct Interpolation:	Rotate Model

Conclusion
In this post, the steps to linear interpolate the vertex in screen space is described. And for rasterizing the depth buffer only (e.g. for occlusion), the depth value can be linearly interpolated directly with the z coordinate in NDC space which is even simpler.

References
[1] http://www.lysator.liu.se/~mikaelk/doc/perspectivetexture/
[2] http://www.gamedev.net/topic/581732-perspective-correct-depth-interpolation/
[3] http://chrishecker.com/Miscellaneous_Technical_Articles
[4] http://en.wikipedia.org/wiki/Affine_transformation

Software Rasterizer Part 1

Introduction
Software rasterizer can be used for occlusion culling, some games such as Killzone 3 use this to cull objects. So I decided to write one by myself. The steps are first to transform vertices to homogenous coordinates, clip the triangles to the viewport and then fill the triangles with interpolated parameters. Note that the clipping process should be done in homogenous coordinates before the perspective division, otherwise lots of the extra work are need to clip the triangles properly and this post will explain why clipping should be done before the perspective division.

Points in Homogenous coordinates
In our usual Cartesian Coordinate system, we can represent any points in 3D space in the form of (X, Y, Z). While in Homogenous coordinates, a redundant component w is added which resulting in a form of (x, y, z, w). Multiplying any constant (except zero) to that 4-components vector is still representing the same point in homogenous coordinates. To convert a homogenous point back to our usual Cartesian Coordinate, we would multiply a point in homogenous coordinates so that the w component is equals to one:

(x, y, z, w) -> (x/w , y/w , z/w, 1) -> (X, Y, Z)

In the following figure, we consider the x-w plane, a point (x, y, z, w) is transformed back to the usual Cartesian Coordinates (X, Y, Z) by projecting onto the w=1 plane:

figure 1. projecting point to w=1 plane

The interesting point comes when the w component is equals to zero. Imagine the w component is getting smaller and smaller, approaching zero, the coordinates of point (x/w , y/w , z/w, 1) will getting larger and larger. When w is equals to zero, we can represent a point at infinity.

Line Segments in Homogenous coordinates
In Homogenous coordinates, we still can represent a line segment between two points P0= (x0, y0, z0, w0) and P1= (x1, y1, z1, w1) in parametric form:

L= P0 + t * (P1-P0), where t is within [0, 1]

Then we can get a line having the shape:

figure 2. internal line segment

The projected line on w=1 is called internal line segment in the above case.
But what if the coordinates of P0 and P1 having the coordinates where w0 < 0 and w1 > 0 ?

figure 3. external line segment

In this case, it will result in the above figure, forming an external line segment. It is because the homogenous line segment have the form L= P0 + t * (P1-P0), when moving the parameter from t=0 to t= 1, since w0 < 0 and w1 > 0, there exist a point on the homogenous line where w=0. This point is at infinity when projected to the w=1 plane, resulting the projected line segment joining P0 and P1 passes through the point at infinity, forming an external line segment.

The figure below shows how points are transformed before and after perspective projection and divided by w:

figure 4. region mapping

The blue line shows the viewing frustum, nothing unusual for the region in front of the eye. The unusual things are the points behind the eye. After perspective transformation and projected to w=1 plane, those points are transformed in front of the eye too. So for line segment with one point in front of the eye and the other behind the eye, it would be transformed to the external line segment after the perspective division.

Triangles in Homogenous coordinates

In the last section, we know that there are internal and external line segments after the perspective division, we also have internal and external triangles. The internal triangles are the one that we usually sees. The external triangles must be formed by 1 internal line segment and 2 external line segments:

figure 5. external triangle

In the above figure, the shaded area represents the external triangle formed by the points P0, P1 and P2. This kind of external triangles may appear after the perspective projection transform. And this happens in our real world too:

an external triangle in real world

the full triangle of the left photo

In the left photo, it shows an external triangle with one of the triangle vertex far behind the camera while the right photo shows the full view of the triangle and the cross marked the position of the camera where the left photo is taken.

Triangles clipping
To avoid the case of external triangles, lines/triangles should be clipped in homogenous coordinates before divided by the w-component. The homogenous point (x, y, z, w) will be tested with the following inequalities:

(-w <= x <= w) && ------ inequality. 1

(-w <= y <= w) && ------ inequality. 2

(-w <= z <= w ) && ------ inequality. 3
w > 0 ------ inequality. 4

(The z clipping plane inequality is 0 <= z <= w in the case for D3D, it depends on how the normalized device coordinates are defined.) Clipping by inequality 1,2,3 will effectively clip all points that with w < 0 because if w < 0, say w = -3:

3 <= x <= -3 => 3 <= -3

which is impossible. But the point (0, 0, 0, 0) is still satisfy the first 3 inequalities and forming external cases, so inequality 4 is added. Consider a homogenous line with one end as (0, 0, 0, 0), it will equals to:

L= (0, 0, 0, 0) + t * [ (x, y, z, w) - (0, 0, 0, 0) ] = t * (x, y, z, w)

which represent only a single point in homogenous coordinates. So triangle (after clipped by inequality 1, 2, 3) having one or two vertices with w=0 will result in either a line or a point which can be discarded. Hence, after clipping, no external triangles will be produced when dividing by w-component. To clip a triangle against a plane, the triangle may result in either 1 or 2 triangles depends on whether there are 1 or 2 vertex outside the clipping plane:

figure 6. clipping internal triangles

Then the clipped triangles can be passed to the next stage to be rasterized either by a scan line algorithm or by a half-space algorithm.

Below is the clipping result of an external triangles with 1 vertex behind the camera.