Photon Mapping Part 1

Introduction

In this generation of computer graphics, global illumination (GI) is an important technique which calculate indirect lighting within a scene. Photon mapping is one of the GI technique using particle tracing to compute images in offline rendering. Photon mapping is an easy to implement technique, so I choose to learn it and my target is to bake light map storing indirect diffuse lighting information using the photon map. Photon mapping consists of 2 passes: photon map pass and render pass, which will be described below.


Photon Map Pass

In this pass, photons will be casted into the scene from the position of light source. Each photon store packet of energy. When photon hits a surface of the scene, the photon will either be reflected (either diffusely or specularly), transmitted  or absorbed, which is determined by Russian roulette. 

Photons are traced in the scene to simulate the light transportation

This hit event represents the incoming energy of that surface and will be stored in a k-d tree (known as photon map) for looking up in the render pass. Each hit event would store the photon energy, the incoming direction and the hit position.
However, it is more convenient to store radiance than storing energy in photon because when using punctual light source(e.g. point light), it is hard to compute the energy emits given the light source radiance. So I use the method described in Physically Based Rendering, a weight of radiance is stored in each photon:



When a photon hits a surface, the probability of being reflected in a new random direction used in Russian roulette is:



This probability equation is chosen because photon will have a higher chance of being reflected if it is brighter. If the photon is reflected, its radiance will be updated to:


And the photon will continue to trace in the newly reflected direction.

Render pass

In render pass, the direct and indirect lighting is computed separately. The direction lighting is computed using ray tracing.

Direct light only

The indirect lighting is computed by sampling from the photon map. When calculating the indirect lighting in a given position(in this case, the shading pixel), we can locate N nearby photons in photon map to estimate the incoming radiance using kernel density estimation. A kernel function need to satisfy the conditions:

I use the Simpson's kernel(also known as Silverman's second order kernel) suggested in the book Physically Based Rendering:

Then the density can be computed using kernel estimator for N samples within a distance d (i.e. the distance of the photon that is the most far away in the N samples):
Then the reflected radiance at the shading position can be computed with:
However, the result showing some circular artifact:

Using the photon map directly for indirect diffuse
light would show artifact
To tackle this problem, either increase the number of photon to a very high number, or we can perform a final gather step. In the final gather step, we shoot a number of final gather rays from the pixel that we are shading in random direction over the hemisphere of the shading point.

Final gather rays are casted from every shading position

When final gather ray hit another surface, then the photon map is queried just like before and the reflected radiance from this surface will be the incoming radiance of the shading pixel. Using Monte Carlo integration, the reflected radiance at the shading pixel can be calculated by sampling the final gather rays. Here is the final result:

Direct light + Indirect light, with final gather
Indirect light only, with final gather
Conclusion

In this post, the steps to implement photon map is briefly described. It is a 2 passes approach with the photon map pass building a photon map as kd-tree representing the indirect lighting data and the render pass use the photon map to compute the final image. In next part, I will describe how to make use of the photon map to bake light map for real time application.



References
A Practical Guide to Global Illumination using Photon Maps: http://nameless.cis.udel.edu/class_data/cg/jensen_photon_mapping_tutorial.pdf
Physically Based Rendering: http://www.pbrt.org/

Software Rasterizer Part 2

Introduction
Continue with the previous post, after filling the triangle with scan line or half-space algorithm, we also need to interpolate the vertex attributes across the triangle so that we can have texture coordinates or depth on every pixel. However we cannot directly interpolate those attributes in screen space because projection transform after perspective division is not an affine transformation (i.e. after transformation, the mid-point of the line segment is no longer the mid-point), this will result in some distortion and this artifact is even more noticeable when the triangle is large:

interpolate in screen space

perspective correct interpolation
Condition for linear interpolation
When interpolating the attributes in a linear way, we are saying that given a set of vertices, vi (where i is any integer>=0) with a set of attributes ai (such as texture coordinates),
we have a function mapping a vertex to the corresponding attributes, i.e.

f(vi)= ai

Say, to interpolate a vertex inside a triangle in a linear way, the function f need to have the following properties:

f(t0 *v0 + t1 *v1 + t2 *v2 ) = t0 * f(v0) + t1 * f(v1) + t2 * f(v2)
, for any t0t1t2 where t0 t1 t2=1

which means that we can calculate the interpolated attributes using the same weight ti used for interpolating vertex position. For functions having the above properties, those functions will be an affine function with the following form:

f(x)= Ax + b
, where A is a matrix, x and b are vector

Depth interpolation
When a vertex is projected from view space to normalized device coordinates(NDC), we will have the following relation (ratio of the triangles) between the view space and NDC space:


substitute equation 1 and 2 into the plane equation of the triangle lies on:


So, 1/zview is an affine function of xndc and yndc which can be interpolated linearly across the screen space (the transform from NDC space to screen space is a linear transform).


Attributes interpolation
In last section, we know how to interpolate the depth of a pixel linearly in screen space, the next problem is to interpolate the vertex attributes(e.g. texture coordinates). In view space, we know that those attributes can be interpolated linearly, so those attributes can be calculated by an affine function with the vertex position as parameters e.g.



Similar to interpolate depth, substitute equation 1 and 2 into the above equation:


Therefore, u/zview is an another affine function of xndc and yndc which can be interpolated linearly across the screen space. Hence we can interpolating u linearly by first interpolate 1/zview and u/zview across screen space, and then divide them per pixel.

The last problem...
Now, we know that we can interpolate the view space depth and vertex attributes linearly across screen space. But during the rasterization state, we only have vertices in homogenous coordinates (vertices are transformed by the projection matrix already), how can we get the zview to do the perspective correct interpolation?
Consider the projection matrix (I use D3D one, but the same for openGL):

After transforming the vertex position, the w-coordinate will be the view space depth!

i.e. whomogenous = zview 

And look at the matrix again and consider the transformed z-coordinates, it will in a form of:


After transforming to the NDC,


So the depth value can be directly interpolated using zNDC for depth test.

Demo
A javascript demo to rasterize the triangles is provided(although not optimized...). And the source code can be downloaded here.
            Your browser does not support the canvas tag.
        
Render Type:
Depth Only
Texture Only
Lighting Only
Texture + Lighting
Model:
Box
Duck
Perspective Correct Interpolation:
Rotate Model

Conclusion
In this post, the steps to linear interpolate the vertex in screen space is described. And for rasterizing the depth buffer only (e.g. for occlusion), the depth value can be linearly interpolated directly with the z coordinate in NDC space which is even simpler.

References
[1] http://www.lysator.liu.se/~mikaelk/doc/perspectivetexture/
[2] http://www.gamedev.net/topic/581732-perspective-correct-depth-interpolation/
[3] http://chrishecker.com/Miscellaneous_Technical_Articles
[4] http://en.wikipedia.org/wiki/Affine_transformation


Software Rasterizer Part 1

Introduction
Software rasterizer can be used for occlusion culling, some games such as Killzone 3 use this to cull objects.  So I decided to write one by myself. The steps are first to transform vertices to homogenous coordinates, clip the triangles to the viewport and then fill the triangles with interpolated parameters.  Note that the clipping process should be done in homogenous coordinates before the perspective division, otherwise lots of the extra work are need to clip the triangles properly and this post will explain why clipping should be done before the perspective division.

Points in Homogenous coordinates
In our usual Cartesian Coordinate system, we can represent any points in 3D space in the form of (X, Y, Z). While in Homogenous coordinates, a redundant component w is added which resulting in a form of (x, y, z, w). Multiplying any constant (except zero) to that 4-components vector is still representing the same point in homogenous coordinates. To convert a homogenous point back to our usual Cartesian Coordinate, we would multiply a point in homogenous coordinates so that the w component is equals to one:

(xyzw) -> (x/y/z/w, 1) -> (XYZ)

In the following figure, we consider the x-w plane, a point (xyzw) is transformed back to the usual Cartesian Coordinates (XYZ) by projecting onto the w=1 plane:
figure 1. projecting point to w=1 plane

The interesting point comes when the w component is equals to zero. Imagine the w component is getting smaller and smaller, approaching zero, the coordinates of point (x/y/z/w, 1) will getting larger and larger. When w is equals to zero, we can represent a point at infinity.

Line Segments in Homogenous coordinates
In Homogenous coordinates, we still can represent a line segment between two points P0= (x0y0z0w0) and  P1= (x1y1z1w1) in parametric form:

L= P0 + t * (P1-P0),   where t is within [0, 1]

Then we can get a line having the shape:
figure 2. internal line segment
The projected line on w=1 is called internal line segment in the above case.
But what if the coordinates of P0 and P1 having the coordinates where w0 < 0 and w1 > 0 ?
figure 3. external line segment
In this case, it will result in the above figure, forming an external line segment. It is because the homogenous line segment have the form L= P0 + t * (P1-P0), when moving the parameter from t=0 to t= 1, since w0 < 0 and w1 > 0, there exist a point on the homogenous line where w=0. This point is at infinity when projected to the w=1 plane, resulting the projected line segment joining P0 and P1 passes through the point at infinity, forming an external line segment.

The figure below shows how points are transformed before and after perspective projection and divided by w:
figure 4. region mapping
The blue line shows the viewing frustum, nothing unusual for the region in front of the eye. The unusual things are the points behind the eye. After perspective transformation and projected to w=1 plane, those points are transformed in front of the eye too. So for line segment with one point in front of the eye and the other behind the eye, it would be transformed to the external line segment after the perspective division.

Triangles in Homogenous coordinates
In the last section, we know that there are internal and external line segments after the perspective division, we also have internal and external triangles. The internal triangles are the one that we usually sees. The external triangles must be formed by 1 internal line segment and 2 external line segments:

figure 5. external triangle
In the above figure, the shaded area represents the external triangle formed by the points P0, P1 and P2. This kind of external triangles may appear after the perspective projection transform. And this happens in our real world too:

an external triangle in real world

the full triangle of the left photo
In the left photo, it shows an external triangle with one of the triangle vertex far behind the camera while the right photo shows the full view of the triangle and the cross marked the position of the camera where the left photo is taken.

Triangles clipping
To avoid the case of external triangles, lines/triangles should be clipped in homogenous coordinates before divided by the w-component. The homogenous point (xyzw) will be tested with the following inequalities:

(-<= <= w) &&   ------ inequality. 1
(-<= <= w) &&   ------ inequality. 2
(-<= <= w ) &&   ------ inequality. 3
> 0    ------ inequality. 4

(The z clipping plane inequality is 0 <= <= w in the case for D3D, it depends on how the normalized device coordinates are defined.) Clipping by inequality 1,2,3 will effectively clip all points that with < 0 because if < 0, say = -3:

3 <= x <= -3     =>     3 <= -3

which is impossible. But the point (0, 0, 0, 0) is still satisfy the first 3 inequalities and forming external cases, so inequality 4 is added. Consider a homogenous line with one end as (0, 0, 0, 0), it will equals to:

L= (0, 0, 0, 0) + t * [ (x, y, z, w) - (0, 0, 0, 0) ] = t * (xyzw)

which represent only a single point in homogenous coordinates. So triangle (after clipped by inequality 1, 2, 3) having one or two vertices with w=0 will result in either a line or a point which can be discarded. Hence, after clipping, no external triangles will be produced when dividing by w-component. To clip a triangle against a plane, the triangle may result in either  1 or 2 triangles depends on whether there are 1 or 2 vertex outside the clipping plane:
figure 6. clipping internal triangles
Then the clipped triangles can be passed to the next stage to be rasterized either by a scan line algorithm or by a half-space algorithm.

Below is the clipping result of an external triangles with 1 vertex behind the camera.
clipping external triangle in software rasterizer
Below is another rasterized result:

rasterized duck model

reference of the duck model
Conclusion
In this post, the maths behind the clipping of triangles are explained. Clipping should be done before projecting the homogenous point to the w=1 to avoid taking special cares to clip the external triangles. In the next post, I will talk about the perspective interpolation and the source code will be given in the next post (written in  javascript, drawing to html canvas).

And lastly special thanks to Fabian Giesen for giving feedback during the draft of this post.

References
[1] http://research.microsoft.com/pubs/73937/p245-blinn.pdf
[2] http://medialab.di.unipi.it/web/IUM/Waterloo/node51.html
[3] http://kriscg.blogspot.com/2010/09/software-occlusion-culling.html
[4] http://www.slideshare.net/guerrillagames/practical-occlusion-culling-in-killzone-3
[5] http://www.slideshare.net/repii/parallel-graphics-in-frostbite-current-future-siggraph-2009-1860503
[6] http://fgiesen.wordpress.com/2011/07/05/a-trip-through-the-graphics-pipeline-2011-part-5/
[7] http://devmaster.net/forums/topic/1145-advanced-rasterization/

Light Pre Pass Renderer on iPhone

Introduction
About a month ago, I bought an iPhone 4s, so I write some code on my new toy. Although this device does not support multiple render target(MRT), it do support rendering to a floating point render target (only available on iPhone 4s and iPad2). So I test it with a light pre pass renderer:


In the test, HDR lighting is done (gamma= 2.0 instead of 2.2, without adaptation) with 3 post processing filters (flimic tone mapping, bloom and photo filter). In the test scene, 3 directional lights(1 of them cast shadow with 4 cascade) and 30 point lights are used with 2 skinned models and running bullet physics at the same time which can have around 28~32fps.

G-buffer layout
I have tried 2 different layout for the G-buffer. My first attempt is to use one 16-bit render target with R channel storing the depth value, G and B channel storing the view space normal using the encoding method from "A bit more deferred-CryEngine 3" and A channel storing the glossiness for specular lighting calculation. But later I discovered that this device support the openGL extension GL_OES_depth_texture which can render the depth buffer into a texture. So my second attempt is to switch the G-buffer layout to use the RGB channels to store the view space normal without encoding and A channel storing the glossiness while the depth can be sampled directly from the depth texture.

G-buffer storing view space normal and glossiness

Depth buffer
Switching to this layout gives a boost in the frame rate as the normal value does not need to encode/decode from the texture. However, making the 16-bit render target to 8-bit to store normal and glossiness does not give any performance improvement, probably because the test scene is not bound by band width.

Stencil optimization
The second optimization is to optimize the deferred lights, using the stencil trick by drawing a convex light polygon to cull those pixels that do not need to perform lighting.

drawing the bounding volume of the point lights
However, after finish implementing the stencil trick, the frame rate drops... This is because when filling the stencil buffer,  I use the shader that is the same as the one used for performing lighting. Even the color write is disabled during filling the stencil buffer, the GPU is still doing redundant work. So a simple shader is used in the stencil pass instead which improve the performance.
Also, drawing out the shape of the point lights make me discover that the attenuation factor I used (i.e. 1/(1+k.d+k.d^2) ) have a large area that does not get lit, so I switch to a more simple linear falloff model (e.g. 1- lightDistance/lightRange, can give an exponent to control the falloff) to give a tighter bound.

light buffer
Combining post-processing passes
Combining the full screen render passes can help performance. In the test scene, originally the bloom result is additively blend with the tone-mapped scene render target, followed by a photo filter and render to the back buffer. Combining these passes by calculating the additive blend with tone-mapped scene inside the photo filter shader which is faster than before.

Resolution
The program is run at a low resolution with back buffer of 480x320pixels. Also, the G-buffer and the post processing textures are further scaled down to 360x300pixels. This can reduce the number of fragments need to be shaded by the pixel shaders.

Shadow
In the scene, cascaded shadow map is used with 4 cascade (resolution= 256x256). I have tried using the GL_EXT_shadow_samplers extension, hoping that it can helps the frame rate. But the result is disappointing as the speed of the extension is the same as performing comparison inside the shader...


It takes around 8ms for calculating shadow and blurring it. If a basic shadow map is used instead (i.e. without cascade) with blurring, it gives some or little performance boost depends on whether there are how many point lights on screen. Of course switching off the blur will speed up the shadow calculation a lot.

basic shadow map

basic shadow map with blur

Cascaded shadow map

Cascaded shadow map with blur
Conclusion
In this post, I described the methods used to make a light pre pass renderer to run on the iPhone to achieve 30fps with 30 dynamic lights. However, high resolution is sacrificed in order to keep the dynamic lights, HDR lighting and the post processing filters. Also, no anti aliasing is done in the test as the frame rate is not good enough. May be MSAA can be done if the basic shadow map is used instead of cascade. But these will leave for future investigation.

References
[1] Light Pre Pass Renderer: http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html
[2] A bit more deferred - CryEngine 3: http://www.crytek.com/sites/default/files/A_bit_more_deferred_-_CryEngine3.ppt
[3] Filmic tone mapping operators: http://filmicgames.com/archives/75
[4] Crysis Next Gen Effects: http://www.crytek.com/sites/default/files/GDC08_SousaT_CrysisEffects.ppt
[5] Position From Depth 3: Back In The Habit: http://mynameismjp.wordpress.com/2010/09/05/position-from-depth-3/
[6] Fast Mobile Shaders: http://blogs.unity3d.com/wp-content/uploads/2011/08/FastMobileShaders_siggraph2011.pdf
[7] GLSL Optimizer: http://aras-p.info/blog/2010/09/29/glsl-optimizer/
[8] Deferred Cascaded Shadow Maps: http://aras-p.info/blog/2009/11/04/deferred-cascaded-shadow-maps/

Extracting dominant light from Spherical Harmonics

Introduction
Spherical Harmonics(SH) functions can represent low frequency data such as diffuse lighting, where those high frequency details are lost after projected to SH. Luckily we can extract a dominant directional light from SH coefficients to fake specular lighting. We can also extract more than 1 directional light from SH coefficients, but this post will only focus on extracting 1 dominant light, those interested can read Stupid Spherical Harmonics (SH) Tricks for the details. A webGL demo is provided at the last section which will only extract 1 directional light.


Extracting dominant light direction
We can get a single dominant light direction from the SH projected environment lighting, Le. Consider we approximate the environment light up to band 1 (i.e. l=1):

Finding the dominant light direction is equivalent to choose an incoming direction, ω, so that Le(ω)is maximized. In other words, cosθ should equals to 1:


So we can extract the dominant light direction for a single color channel. Finally the dominant light direction can be calculated by scaling each dominant direction for RGB channels using the ration that convert color to gray scale:


Extracting dominant light intensity
After extracting the light direction, the remaining problem is to calculate the light intensity. That's mean we want to calculate an intensity s, so that the error between the extracted light and the light environment is at minimum (Le is the original environment light while Ld is the directional light):

To minimize the error, differentiate the equation and solve it equals to zero:

If both lighting functions are projected into SH, the intensity can be simplified to:

The next step is to project the directional light(with unit intensity) into SH basis (ci is the SH coefficient of the projected directional light):

Therefore the SH coefficients of projected directional light can be calculated by substituting the light direction into the corresponding SH basis function.

As the SH projected directional light is in unit intensity, we want to scale it with a factor so that the extracted light intensity s is the light color that can be ready for use in direct lighting equation which is defined as (detail explanation can be found in [4]):
For artist convenience, clight does not correspond to a direct radiometric measure of the light’s intensity; it is specified as the color a white Lambertian surface would have when illuminated by the light from a direction parallel to the surface normal (lc = n).
So we need to calculate a scaling factor, c, that scale the SH projected directional light such that:


We can project both L(ω) and (n . ω) into SH to calculate the integral. To project the transfer function (nω) into SH, we can first align the n to +Z-axis, which is zonal harmonics, then we can rotate the ZH coefficient into any direction using the equation:

The ZH coefficients of (n . ω) are: (note that the result is different from Stupid Spherical Harmonics (SH) Tricks in the Normalization section as we have taken the π term outside the integral)


Then rotate the ZH coefficients such that the normal direction is equals to the light direction, ld (because we need ld = n as stated above), we have:

Finally we can go back to compute the scaling factor, c,  for the SH projected directional light (we calculate up to band=2):

Therefore the steps to extract the dominant light intensity are first to project the directional light into SH with a scaling factor c, and then light color, s,  can be calculated by:



WebGL Demo
A webGL demo (need a webGL enabled browser such as Chrome) is provided to illustrate how to extract a single directional light to fake the specular lighting from the SH coefficient. The specular lighting is calculated using the basic Blinn-Phong specular team for simplicity reason, other specular lighting equation can be used such as those physically plausible. (The source code can be downloaded from here.)
Your browser does not support the canvas tag/WebGL. This is a static example of what would be seen.
Render Diffuse
Render Specular
Rotate Model
Glossiness
Conclusion
Extracting the dominant directional light from SH projected light is easy to compute with the following steps: First, calculate the dominant light direction. Second, project the dominant light into SH with a normalization factor. Third, calculate the light color. The extracted light can be used for specular lighting to give an impression of high frequency lighting.

References
[1] Stupid Spherical Harmonics (SH) Tricks: http://www.ppsloan.org/publications/StupidSH36.pdf
[5] PI or not to PI in game lighting equation: http://seblagarde.wordpress.com/2012/01/08/pi-or-not-to-pi-in-game-lighting-equation/
[6] March of the Froblins: Simulation and Rendering Massive Crowds of Intelligent and Detailed Creatures on GPU: http://developer.amd.com/documentation/presentations/legacy/Chapter03-SBOT-March_of_The_Froblins.pdf
[7] Pick dominant light from sh coeffs: http://sourceforge.net/mailarchive/message.php?msg_id=28778827