Light Pre Pass Renderer on iPhone

Introduction
About a month ago, I bought an iPhone 4s, so I write some code on my new toy. Although this device does not support multiple render target(MRT), it do support rendering to a floating point render target (only available on iPhone 4s and iPad2). So I test it with a light pre pass renderer:


In the test, HDR lighting is done (gamma= 2.0 instead of 2.2, without adaptation) with 3 post processing filters (flimic tone mapping, bloom and photo filter). In the test scene, 3 directional lights(1 of them cast shadow with 4 cascade) and 30 point lights are used with 2 skinned models and running bullet physics at the same time which can have around 28~32fps.

G-buffer layout
I have tried 2 different layout for the G-buffer. My first attempt is to use one 16-bit render target with R channel storing the depth value, G and B channel storing the view space normal using the encoding method from "A bit more deferred-CryEngine 3" and A channel storing the glossiness for specular lighting calculation. But later I discovered that this device support the openGL extension GL_OES_depth_texture which can render the depth buffer into a texture. So my second attempt is to switch the G-buffer layout to use the RGB channels to store the view space normal without encoding and A channel storing the glossiness while the depth can be sampled directly from the depth texture.

G-buffer storing view space normal and glossiness

Depth buffer
Switching to this layout gives a boost in the frame rate as the normal value does not need to encode/decode from the texture. However, making the 16-bit render target to 8-bit to store normal and glossiness does not give any performance improvement, probably because the test scene is not bound by band width.

Stencil optimization
The second optimization is to optimize the deferred lights, using the stencil trick by drawing a convex light polygon to cull those pixels that do not need to perform lighting.

drawing the bounding volume of the point lights
However, after finish implementing the stencil trick, the frame rate drops... This is because when filling the stencil buffer,  I use the shader that is the same as the one used for performing lighting. Even the color write is disabled during filling the stencil buffer, the GPU is still doing redundant work. So a simple shader is used in the stencil pass instead which improve the performance.
Also, drawing out the shape of the point lights make me discover that the attenuation factor I used (i.e. 1/(1+k.d+k.d^2) ) have a large area that does not get lit, so I switch to a more simple linear falloff model (e.g. 1- lightDistance/lightRange, can give an exponent to control the falloff) to give a tighter bound.

light buffer
Combining post-processing passes
Combining the full screen render passes can help performance. In the test scene, originally the bloom result is additively blend with the tone-mapped scene render target, followed by a photo filter and render to the back buffer. Combining these passes by calculating the additive blend with tone-mapped scene inside the photo filter shader which is faster than before.

Resolution
The program is run at a low resolution with back buffer of 480x320pixels. Also, the G-buffer and the post processing textures are further scaled down to 360x300pixels. This can reduce the number of fragments need to be shaded by the pixel shaders.

Shadow
In the scene, cascaded shadow map is used with 4 cascade (resolution= 256x256). I have tried using the GL_EXT_shadow_samplers extension, hoping that it can helps the frame rate. But the result is disappointing as the speed of the extension is the same as performing comparison inside the shader...


It takes around 8ms for calculating shadow and blurring it. If a basic shadow map is used instead (i.e. without cascade) with blurring, it gives some or little performance boost depends on whether there are how many point lights on screen. Of course switching off the blur will speed up the shadow calculation a lot.

basic shadow map

basic shadow map with blur

Cascaded shadow map

Cascaded shadow map with blur
Conclusion
In this post, I described the methods used to make a light pre pass renderer to run on the iPhone to achieve 30fps with 30 dynamic lights. However, high resolution is sacrificed in order to keep the dynamic lights, HDR lighting and the post processing filters. Also, no anti aliasing is done in the test as the frame rate is not good enough. May be MSAA can be done if the basic shadow map is used instead of cascade. But these will leave for future investigation.

References
[1] Light Pre Pass Renderer: http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html
[2] A bit more deferred - CryEngine 3: http://www.crytek.com/sites/default/files/A_bit_more_deferred_-_CryEngine3.ppt
[3] Filmic tone mapping operators: http://filmicgames.com/archives/75
[4] Crysis Next Gen Effects: http://www.crytek.com/sites/default/files/GDC08_SousaT_CrysisEffects.ppt
[5] Position From Depth 3: Back In The Habit: http://mynameismjp.wordpress.com/2010/09/05/position-from-depth-3/
[6] Fast Mobile Shaders: http://blogs.unity3d.com/wp-content/uploads/2011/08/FastMobileShaders_siggraph2011.pdf
[7] GLSL Optimizer: http://aras-p.info/blog/2010/09/29/glsl-optimizer/
[8] Deferred Cascaded Shadow Maps: http://aras-p.info/blog/2009/11/04/deferred-cascaded-shadow-maps/

Extracting dominant light from Spherical Harmonics

Introduction
Spherical Harmonics(SH) functions can represent low frequency data such as diffuse lighting, where those high frequency details are lost after projected to SH. Luckily we can extract a dominant directional light from SH coefficients to fake specular lighting. We can also extract more than 1 directional light from SH coefficients, but this post will only focus on extracting 1 dominant light, those interested can read Stupid Spherical Harmonics (SH) Tricks for the details. A webGL demo is provided at the last section which will only extract 1 directional light.


Extracting dominant light direction
We can get a single dominant light direction from the SH projected environment lighting, Le. Consider we approximate the environment light up to band 1 (i.e. l=1):

Finding the dominant light direction is equivalent to choose an incoming direction, ω, so that Le(ω)is maximized. In other words, cosθ should equals to 1:


So we can extract the dominant light direction for a single color channel. Finally the dominant light direction can be calculated by scaling each dominant direction for RGB channels using the ration that convert color to gray scale:


Extracting dominant light intensity
After extracting the light direction, the remaining problem is to calculate the light intensity. That's mean we want to calculate an intensity s, so that the error between the extracted light and the light environment is at minimum (Le is the original environment light while Ld is the directional light):

To minimize the error, differentiate the equation and solve it equals to zero:

If both lighting functions are projected into SH, the intensity can be simplified to:

The next step is to project the directional light(with unit intensity) into SH basis (ci is the SH coefficient of the projected directional light):

Therefore the SH coefficients of projected directional light can be calculated by substituting the light direction into the corresponding SH basis function.

As the SH projected directional light is in unit intensity, we want to scale it with a factor so that the extracted light intensity s is the light color that can be ready for use in direct lighting equation which is defined as (detail explanation can be found in [4]):
For artist convenience, clight does not correspond to a direct radiometric measure of the light’s intensity; it is specified as the color a white Lambertian surface would have when illuminated by the light from a direction parallel to the surface normal (lc = n).
So we need to calculate a scaling factor, c, that scale the SH projected directional light such that:


We can project both L(ω) and (n . ω) into SH to calculate the integral. To project the transfer function (nω) into SH, we can first align the n to +Z-axis, which is zonal harmonics, then we can rotate the ZH coefficient into any direction using the equation:

The ZH coefficients of (n . ω) are: (note that the result is different from Stupid Spherical Harmonics (SH) Tricks in the Normalization section as we have taken the π term outside the integral)


Then rotate the ZH coefficients such that the normal direction is equals to the light direction, ld (because we need ld = n as stated above), we have:

Finally we can go back to compute the scaling factor, c,  for the SH projected directional light (we calculate up to band=2):

Therefore the steps to extract the dominant light intensity are first to project the directional light into SH with a scaling factor c, and then light color, s,  can be calculated by:



WebGL Demo
A webGL demo (need a webGL enabled browser such as Chrome) is provided to illustrate how to extract a single directional light to fake the specular lighting from the SH coefficient. The specular lighting is calculated using the basic Blinn-Phong specular team for simplicity reason, other specular lighting equation can be used such as those physically plausible. (The source code can be downloaded from here.)
Your browser does not support the canvas tag/WebGL. This is a static example of what would be seen.
Render Diffuse
Render Specular
Rotate Model
Glossiness
Conclusion
Extracting the dominant directional light from SH projected light is easy to compute with the following steps: First, calculate the dominant light direction. Second, project the dominant light into SH with a normalization factor. Third, calculate the light color. The extracted light can be used for specular lighting to give an impression of high frequency lighting.

References
[1] Stupid Spherical Harmonics (SH) Tricks: http://www.ppsloan.org/publications/StupidSH36.pdf
[5] PI or not to PI in game lighting equation: http://seblagarde.wordpress.com/2012/01/08/pi-or-not-to-pi-in-game-lighting-equation/
[6] March of the Froblins: Simulation and Rendering Massive Crowds of Intelligent and Detailed Creatures on GPU: http://developer.amd.com/documentation/presentations/legacy/Chapter03-SBOT-March_of_The_Froblins.pdf
[7] Pick dominant light from sh coeffs: http://sourceforge.net/mailarchive/message.php?msg_id=28778827




Inverse Kinematics (2 joints) for foot placement

Introduction
Game sometimes need to solve Inverse Kinematics (IK) for more realistic look like foot placement on a terrain. There are different methods to solve an IK problems, some are numerical methods which can be used for general cases, and analytical solution exists only for simple cases like the case for 2 joints. I implemented the analytical method to handle the foot placement in my engine.


2D case
For simple case such as 2 joints in 2D case. In the following figure, suppose we want to rotate the blue leg to the position of the red leg so that the leg is aligned to the ground:


Since we know the pelvis joint position, target position, length of upper leg and lower leg, we can solve the angle α using law of cosines. Then, we can calculate the vector PK' by rotating the vector PT with angle α. Hence angle δ can be calculated with the dot product between the vector PK and vector PK'. The angle at knee joint can be solved either by law of cosines or dot product.

3D case
For the 3D case, it is similar to the 2D case except the rotation may not be on the same plane in order to make to IK results look good.


As there are infinitely many solutions to make the leg reach the target position (because the leg can rotate around the axis a highlighted in red), so we can first calculate one of the solution using law of cosines just like the 2D case (i.e. find the angle α in 2D case), hence we can calculate the upper leg vector v. As this is an arbitrary solution, the result may sometimes look funny:


So, the next step is to find a solution which is close to the pose made by the artist, in other words, we want to minimize the angle δ in the above figure, which is the angle between the arbitrary solution found in the above step and the thigh position posed by the artist. To minimize angle δ, we need to rotate the vector v around the axis a (the red line in the figure) with an angle θ to a new vector v'. In the below equation, v is rotated to v' using quaternion:


As minimizing δ is equivalent to maximizing cosδ (i.e. v'.k), we need to calculate the first derivative of cosδ  and the maximum/minimum value is achieved when equals to zero (all the vectors are normalized):


Then we can get two values of θ within a range of 2π, which correspond to maximizing and minimizing the cosδ. To distinguish whether which solution is maximizing cosδ, we need to substitute θ into the second derivative of cosδ to test whether it is greater than 0, if so, then that θ will minimize cosδ, otherwise, it will maximize cosδ.


Then we can rotate the arbitrary IK solution v with angle θ to v' to give a much better look:


Conclusion
This blog post present an analytical 2 joints IK solution in 3D case for foot placement. We first compute an arbitrary solution for the pelvis joint using law of cosines, then rotate that joint to minimize the angle between the joint after the IK solution and the posed joint before IK to give a much better look.

Reference
[1] http://www.3dkingdoms.com/ik.htm
[2] www.essentialmath.com/InverseKinematics.pps
[3]: The brick texture is obtained from Crytek's Sponza Model: http://crytek.com/cryengine/cryengine3/downloads
[4] The knight model is extracted from the game Infinity Blade using umodel.