Dual Number

Introduction
Recently, I read the "Spherical Skinning with Dual-Quaternions and QTangents" from Crytek. It raised my interest on the topic of "Dual Number" (which is related to Dual Quaternion). Dual number, just like imaginary number, has the form of:
where the real number a is called real part and the real number b is called dual part.


Arithmetic operations
Dual number can perform the arithmetic operations as below:

Addition:
Multiplication:
Division:

Finding derivative using Dual Number
The interesting part of dual number is when it is applied to Taylor Series. When substituting a dual number into a differentiable function using the Taylor Series:
This gives a very nice property that we can find the first derivative, f'(a), by consider the dual part of f(a+bε), which can be evaluated using dual number arithmetic.
For example, given a function
we want to find the first derivative of f(x) at x = 2, i.e. f'(2). We can find it by using dual number arithmetic where f'(2) will equals to the dual part of  f(2+ε) according to Taylor Series.
Therefore, f'(2)= 8/9, you can verify this by finding f'(x) and substitute 2 into it, which will give the same answer.

Conclusion
By using dual number, we can find the derivative of a function using dual arithmetic. Hence, we can also find the tangent to an arbitrary point, p, on a given parametric curve which is equals to the normalized dual part of the point p. For those who are interested in finding out more about dual number, I recommend to read the presentation "Dual Numbers: Simple Math, Easy C++ Coding, and Lots of Tricks" by Gino van den Bergen in GDC Europe 2009.

Reference:
[1] http://en.wikipedia.org/wiki/Dual_number
[2] http://www.gdcvault.com/play/10103/Dual-Numbers-Simple-Math-Easy

Using PID Controller

Introduction
Several weeks ago, I need to implement a game screen for user to choose a level. At that time, I read some post and article about using PID controller for controlling the behavior of a system, so I decided to try it on the UI. PID controller is a control loop feedback mechanism which generate an output to a system based on the difference between a measured value and a desired value:

where f(t) is the output apply back to the system, e(t) is the difference between a measured value and a desired value and P, I, D are tuning variable for controlling the behavior. More information can be found on this post.

Using PID Controller
In the UI, the user can drag on the view to choose the level, when the user swipe on the icons, they will scroll according to velocity and acceleration. The output of PID Controller is used to control the acceleration so that there will always be an icon staying in the middle of the screen when the system becomes stable. The code is ported to WebGL as below(need a WebGL enabled browser to view):

Your browser does not support the canvas tag/WebGL. This is a static example of what would be seen.
P: I: D:

 
 

Tuning the PID variables

The behavior of the scrolling can be controlled by tuning the constants: P, I, D. The effects of changing the 3 constants can be summarized as below:
Summary of the effects of PID constant from Wikipedia
You can play around with the 3 Constants with the input text field above if you have a WebGL enabled browser.

Conclusion
Using PID Controller to manage the system is convenient but it is a bit tricky to tune the PID constants. It is easier to tune the constant one by one and referring to the above table. The source code of my implementation can be downloaded here.

Reference:
[1]: http://en.wikipedia.org/wiki/PID_controller
[2]: http://altdevblogaday.com/2011/08/07/animation-using-closed-loop-control/
[3]: http://altdevblogaday.com/2011/02/27/webgl-part-2-in-the-beginning-there-was/
[4]: http://www.learnopengles.com/how-to-embed-webgl-into-a-wordpress-post/

Writing an iPhone Game Engine (Part 6- Performance)

Introduction
Performance of a game is very important, it is important to maintain a constant frame rate of a game. So I will talk about how I keep track of the performance of the game. However, the game is not finished, I can only profile the engine with sample placeholder assets, the profiled data might not be match with the final data after the game is finished. Anyway, I will try to talk about my first attempt to profile my engine. (All the data are measured with iPhone 3G).

XCode OpenGL ES Analysis
For profiling the graphics performance on iPhone, there is an OpenGL ES analyzer that comes with Xcode 4. It can detect redundant state changes and give some advice to improve rendering performance. It also help me to spot some bugs such as forgetting to set the texture states to use mip-map.
After using the analyzer, I discover that 2 recommendations that improve my engine rendering performance significantly. The first suggestion is to use the EXT_discard_framebuffer extension after rendering each frame.
Before using the analyzer, I did not notice iPhone have this OpenGL extension. This helps the frame rate from 20.8ms per frame to 19.2ms per frame. The second suggestion is to use a more compact back-buffer format to reduce the fill rate.
After setting the back-buffer format from RGBA8888 to RGB565, the time taken for each frame reduced from 19.2ms per frame to 18.52ms per frame.

Performance Graph
I also generate a  performance graph to display how much time each subsystem is used in the engine. It is easier to have a graph to spot spikes of the game so that I can investigate which subsystem cast the spikes.

Conclusion
The above 2 techniques are used to track down the performance issues and increase the frame rate. However, as the game is still under development, there are still some area, such as physics and script, which can be optimized. Those optimization will be done when most of the game is completed. I am going to stop writing this series as most of the problems I have encountered during development are mentioned. May be I will write a postmortem after the game is finished, but this will not in the near future as there are still tons of art-work have not been done yet. Hope you all enjoyed the series.

Writing an iPhone Game Engine (Part 5- Audio)

Introduction
Every games should have an audio system. In my little engine, it supports 2 types of sound: 3D effects sounds and BGM. Separate this 2 types because the BGM can be decoded on hardware. A sample project is provided for playing the BGM using the Audio Queue on iOS.

Effects sounds
Effect sounds are played using OpenAL, which is an API similar to the OpenGL. In the engine, there has an audio thread where all OpenAL calls are executed in this thread and this thread is communicate with main thread through a command buffer which is similar the one used in graphics programming. For example, during main thread update, the game logic may request to play an explosion sound, then an audio command is made and this command is pushed to the command buffer. When the audio thread find that there is a command inside the buffer, the command is executed and initiate an OpenAL call. I set up the audio buffer and thread because the OpenAL call may stall the calling thread according to this article.

BGM
On iPhone, there are hardware to decode audio which can be used through the Apple's Audio Queue API. By this API, you can play sound by creating an audio queue output using AudioQueueNewOutput() providing the audio file description such as sample rate. You can get this description by calling the AudioFileOpenURL(). Unfortunately, this is not suitable in my case as my audio file is already loaded in the memory when the game world tile is streamed, I don't want to called the AudioFileOpenURL() to open the audio file again to get the description of the audio file only. So I decided to get this data by myself and I picked the Apple CAF file format with the AAC compression method because it is an open format and Mac machine have a command line tools to convert file into this file format. (Note that on iPhone, the Audio Queue can only decompress 1 song using hardware decoding. If more audio need to be played, it will fall back to use software decoding.)

CAF file format
Just like the WAV file format, the CAF file format is divided into many different chunks, such as description chunk(which store the sample rate, channels per frame, ...) and the data chunk(which store the audio sample data). The specification of CAF can be found here. We need to get those data inside the CAF file to playback the BGM using AudioQueue. For details, you can take a look at the AudioCAFHelper.cpp in the sample project.
Screen Shot from Sample Project
Apple Audio Queue
To playback the audio using Audio Queue API, we need a couple of steps:
  1. An audio output need to be created using AudioQueueNewOutput().
  2. We need to set the property of the newly created queue by AudioQueueSetProperty() which supply the Magic cookie property which is required by the audio format.
  3. A property listener should be set up using AudioQueueAddPropertyListener() to listen to the event occurs in the audio queue such as the playback is finished.
  4. We need to allocate memory for audio queue to contain the packet description by AudioQueueAllocateBufferWithPacketDescriptions().
  5. After setting up the description, the audio sample data need to be put into the audio queue by AudioQueueEnqueueBuffer().
  6. After that, we need to tell the hardware to decode the audio samples by AudioQueuePrime().
  7. Finally, the audio are ready to be playback using AudioQueueStart().
To stop an audio queue, 3 steps are needed:
  1. AudioQueueStop() need to be called to stop the playback.
  2. The property listener set up in step 3 above needed to be removed by AudioQueueRemovePropertyListener().
  3. Finally, AudioQueueDispose() is called to release all the audio queue resources.
You may refer to the sample project to have a full understand on how to use the audio queue, especially on the part to enqueue the audio sample to audio queue buffer.

Conclusion
Playback 3D effects sound using OpenAL on iPhone is similar to the other platform, while playback the BGM takes some efforts because I need to get the audio file description by myself as the sample code from Apple only provide how to play back an audio by specifying a file path but not the audio file that is loaded in memory. Hope that my sample code can help someone faces the same problem with me.

Reference:
Sample code: http://code.google.com/p/audio-queue-caf-sample/downloads/list
CAF file format:  http://developer.apple.com/library/mac/#documentation/MusicAudio/Reference/CAFSpec/CAF_spec/CAF_spec.html%23//apple_ref/doc/uid/TP40001862-CH210-DontLinkElementID_64
Audio Queue Reference:  http://developer.apple.com/library/ios/#DOCUMENTATION/MusicAudio/Reference/AudioQueueReference/Reference/reference.html
Using Audio on iOS: http://developer.apple.com/library/IOS/#documentation/AudioVideo/Conceptual/MultimediaPG/UsingAudio/UsingAudio.html

Writing an iPhone Game Engine (Part 4- Streaming)

Introduction
My game is an open world game. The player can freely explore the game world. It is impossible to load all the game objects when the game start, so my engine should be able to stream in the game objects when the player is playing.

Loading
In order to stream the game objects, I have to partition the whole game world into many square tiles:


I will load the world tile(s) according to the player position. I divide each world tile into 9 regions as below:

There are 3 types of region in each tile (marked as A, B & C). When the player is in region A, only the tile that the player inside is loaded. When the player is inside region B, 1 of the adjacent tile will be loaded:

And within region C, 3 of the adjacent tiles will be loaded:

So, maximum number of tiles in memory will be 4.

Unloading
To maintain the maximum number of tiles in memory, unseen tiles need to be unloaded. I divide each tile into 4 region for unloading:

Say, when the player is in region 0, the nearby tiles X, Y, Z in the below figure will be unloaded:
The only thing I need to ensure is tiles X, Y, Z are completely unloaded before new tile need to be loaded according to the above rules. If this situation happens, I will block the game until they are unloaded.

Memory and Threading
Since we already know that there are only 4 maximum tiles. Beside the pool memory allocator used for physics/script in the main thread. I also have 4 linear allocator for streaming world tile and all tiles are within this limit. The memory allocation pattern is related to how the threading model works in the game. In the game there are 2 threads: main thread and streaming thread. The main thread is responsible for updating the game logic, physics and rendering, while the streaming thread is for loading resources, decompress textures, etc. When the player update the position of the ship in the main thread, it will signal the streaming thread to load the tile if needed. Several frames later, the streaming thread signal back the main thread finished loading. The communication between 2 threads are double buffered to achieve minimum locking and also ensure the linear allocator will only be used in the streaming thread which can avoid using any mutex. But things go complicated when some of the objects should be created in main thread such as graphics objects and Lua objects. For example, the streaming thread should notify back to the main thread to create an openGL texture handle after it finished decompressing a texture.

Advantages
Using linear allocator for streaming can avoid memory fragmentation. Partition the game world into tiles make managing memory easier as each tile should have roughly the same amount of memory used.

Disadvantages
To unload a world tile, those resources created on CPU side are simply freed by reseting the linear allocator. However, things are not that easy as I think originally... For example, when I want to unload the physics objects in a tile, I need to remove all of them from the collision world first before resetting the allocator. Also, for the graphics objects, I need to release all of the openGL objects in that tile before resetting the allocator, otherwise, leak will occur in the GPU side. I also need to release the script object in the tile so that those scripts can be garbage collected in Lua. Therefore, it is nearly the same as 'delete' all game objects in a tile one by one and cannot free all resources simply by resetting the linear allocator. Besides, creating objects in bullet physics using another custom allocator other than the currently hooked up allocator using btAlignedAllocSetCustom() is not easy too. Since bullet is not designed for allocating objects to another memory allocator(i.e. in my case, the linear allocator for streamed objects such as the rigid bodies and collision shapes). I need to modify bullet source code to make it works.

Conclusion
After making the streaming system, I have a more clear understanding between inter-thread communication as it should be planned carefully, and some of the objects needed to be created on specific thread. Also, I think that it is not a wise choice to divide a dedicated memory region using linear allocator for streaming objects as those objects will be deleted one by one when they need to be removed from the game world. And it costs too much work to modify bullet physics to cope with this memory model and this result in hard to maintain and update bullet physics library.

Reference:
[1] http://www.gamearchitect.net/Articles/StreamingBestiary.html
[2] Game Engine Architecture: http://www.gameenginebook.com/

Writing an iPhone Game Engine (Part 3- Scripting)

Adding script support to an engine has many benefits such as writing game play code without recompile the engine source code which may takes a long time. It also can draw a much clear boundary between the game code and the engine code. I chose to use Lua(ver. 5.1.4) as it is easy to embed and its size is small. As I am no expert on Lua, I would like to write about what I have learnt on how to bind Lua and C/C++.

Calling C function from Lua
First, you need to create a lua_State*, where all Lua operations are done within it, by lua_open() (or you may use lua_newState() if you need to hook up your memory allocator to Lua.)

  1. lua_State* luaState= lua_open();
Lua and C can exchange data through a virtual stack. Both Lua and C can push and pop data from or to the stack. Say, we have already register a C function for Lua to use with function signature:

  1. function drawText(str, screenPosX, screenPosY);
then in Lua side, when the following Lua script is executed:

  1. textWidth= drawText("Ready Go~",  240, 160);
3 values will be pushed to the Lua Stack:
We can get the values from the stack in C using an absolute index counting from the bottom of the stack (start from 1) or a relative index to the top of the stack (start from -1).

Then we can retrieve the values in stack inside the C function called by Lua with the following code:

  1. int drawText(lua_State* luaState)
  2. {
  3. float screenPosY = (float) lua_tonumber(luaState, -1); // get the value 160
  4. float screenPosX = (float) lua_tonumber(luaState, -2); // get the value 240
  5. const char* str = lua_tostring(luaState, -3); // get the value "Ready Go~"
  6. printf("Text '%s' draw at (%f, %f)\n", str, screenPosX, screenPosY);

  7. int textWidth= strlen(str);
  8. lua_pushnumber(luaState, textWidth); // return a value to Lua
  9. return 1; // number of values return to Lua
  10. }
when the C function exit, the stack will look like:

After Lua get the return value from the C function, the parameter and the return value of the C function will be popped out of the stack. 
But before Lua can execute the drawText(), remember to register it to the lua_State* by:

  1. lua_register(luaState, "drawText", drawText);


Calling Lua functions from C
We can also call Lua functions from C. For example, we have declare a function in Lua script for initializing the engine configurations:

  1. function initEngineConfig(date)
  2. print('initialize engine on ' .. date);
  3. end
As Lua is a typeless language, it also treats functions as variables. So we need to push the Lua variable 'initEngineConfig' into the stack with its arguments with the following C code:

  1. lua_getglobal(luaState, "initEngineConfig"); // get the function to the stack
  2. lua_pushstring(luaState, "18th Aug, 2011"); // push the argument of the function
  3. lua_call(luaState, 1, 0); // execute the function

(You may also use lua_pcall() instead of lua_call() to get more debug info  when error occurs)
The above C codes is equivalent to calling a function in Lua:

  1. initEngineConfig("18th Aug, 2011");

We can also use similar technique to execute an object's method in C. For instance, we can call an object's method in Lua:

  1. gameObjectA:update(timeSlice);
We can do this in C too. In Lua, the colon syntax is just a short form for writing the statement:

  1. gameObjectA.update(gameObjectA, timeSlice);
So, we need to push the 'update' function of 'gameObjectA' onto the Lua stack with 2 arguments:
  1. lua_getglobal(luaState, "gameObjectA"); // for getting the 'update' function of 'gameObjectA'
  2. lua_getfield(luaState, -1, "update"); // get the 'update' function of 'gameObjectA'
  3. lua_insert(luaState, -2); // swap the order of "gameObjectA" and "update"
  4. // so that "gameObjectA" becomes an argument
  5. lua_pushnumber(luaState, 1.0f/30.0f); // push the timeSlice argument on the stack.
  6. lua_call(luaState, 2, 0); // execute the functions.
This is basically how Lua interacts with C, but you also need to know how to represent C structure as user data or light user data. You may also need to know LUA_REGISTRY_INDEX for creating a variable in C without worrying conflicts in variable name. After knowing these things you may want to try some binding library to generate the binding. But I hope these little binding methods can help someone who want to bind Lua and C on their own.


Reference:
[1] Lua manual: http://www.lua.org/manual/5.1/manual.html
[2] Lua user data: http://www.lua.org/pil/28.1.html
[3] Lua light user data: http://www.lua.org/pil/28.5.html
[4] LUA_REGISTRY_INDEX: http://www.lua.org/pil/27.3.2.html

Writing an iPhone Game Engine (Part 2- Maya Tools)

Tools are very important in game production, especially when you are working with someone who cannot write code. In my project, I worked with 2 artists, so I need to write some tools to export their models to my engine. There are different choices to export the models, you can parse.obj file format(for static model only), reading .fbx file using FBX SDK, reading COLLADA files... But I choose to extract it directly from the modeling package that the artists use - Writing Maya plugin to extract the model data.

To write Maya plugin for exporting models, we should know how data are stored in Maya first. Basically, Maya stores most of its data (e.g. meshes, transformation...) in a Directed Acyclic Graphic(DAG). In my case, I just need to locate those DAG nodes that store the mesh data. We can traverse the DAG using the iterator MItDag like this:

  1. MStatus status;
  2. MItDag dagIter( MItDag::kDepthFirst, MFn::kInvalid, &status );
  3. MDagPathArray meshPath; // store the DAG nodes that contains mesh
  4. for ( ; !dagIter.isDone(); dagIter.next())
  5. {
  6.   MDagPath dagPath;
  7.   status = dagIter.getPath( dagPath );
  8.   if ( status )
  9.   {
  10.     MFnDagNode dagNode( dagPath, &status );

  11.     // Filter out the DAG nodes that do not contain mesh
  12.     if ( dagNode.isIntermediateObject()) continue;
  13.     if ( !dagPath.hasFn( MFn::kMesh )) continue;
  14.     if ( dagPath.hasFn( MFn::kTransform )) continue;
  15.     meshPath.append(dagPath);
  16.   }
  17. }
then, we can get the mesh data in the DAG nodes using the MFnMesh like this:

  1. for(int i=0; i< meshPath.length(); ++i)
  2. {   
  3.   MDagPath dagPath= meshPath[i];
  4.         
  5.   MFnMesh  fnMesh( dagPath );
  6.   MPointArray meshPoints; // store the position of vertices
  7.   fnMesh.getPoints( meshPoints, MSpace::kWorld );

  8.   // get more mesh data such as normals, UV...
  9. }
For the details of getting the mesh data, you may refer to MAYA API How-To and Maya Exporter Factfile. After getting the mesh data you can export them by creating a sub-class of the MPxFileTranslator and overwrite the writer() function. You can find some useful sample code provided by Maya inside the Maya directory (/Applications/Autodesk/maya2010/devkit/plugin-ins/ on Mac platform) such as the maTranslator.cpp and objExport.cpp.

Another reason I choose to write plugin instead of parsing .fbx/COLLADA is because of extracting the animation data. In my project, I just need to export some simple animations which linear interpolates between key frames, and I would like to get the key frames defined by artists in Maya. I have tried using the FBX SDK but when exporting animation data, it bakes all the animation frames as key frames... Using COLLADA get even worse because I cannot find a good exporter for Maya on the Mac platform... So writing Maya plugin can get rid of all these problems and get the data I want. I can also write a script for artists to set the animation clip data:

After exporting the mesh data, I think it would be nice to edit the collision geometry inside Maya, so I have written another plugin to define the collision shapes of the models:

This plugin works similar to the Dynamica Plugin (In fact, I learnt a lot from it.), except mine can just define simple shapes with only spheres, boxes and capsule shapes. And my plugin cannot do physics simulation inside Maya, it is just for defining the collision shapes. Those collision shapes (sphere/box/capsule) are just sub-class of MPxLocatorNode by overriding the draw() methods with some openGL calls to render the corresponding shapes.

In conclusions, extracting mesh data directly from Maya is not that hard. We can get all the data such as vertex normals, UV sets and key frame data from Maya and do not need to worry about the data loss during export through another formats, especially animation data. Also Maya provides a convenient API to get those data and it is easy to learn. After familiar with the Maya API, I can also write another plugin to define the collision shapes. Next time when you need to export mesh data, you may consider to extract them directly from the modeling package rather than parsing a file format.

Reference:
[1] MAYA API How-To: http://ewertb.soundlinker.com/api/api.018.php
[2] Maya Exporter Factfile: http://nccastaff.bournemouth.ac.uk/jmacey/RobTheBloke/www/research/index.htm
[3] Rob The Bloke: http://nccastaff.bournemouth.ac.uk/jmacey/RobTheBloke/www/
[4] http://www.vfxoverflow.com/questions/add-remove-framelayouts-in-a-window-using-mel
[5] http://bulletphysics.org/mediawiki-1.5.8/index.php/Maya_Dynamica_Plugin

Writing an iPhone Game Engine (Part 1- Memory management)

On the iPhone platform, memory is a very precious resources. If they are not handled properly, the application will receive one or two memory warning, and then your application will be killed by the OS. So I decided to write my own memory allocator to preallocate a large chunk of memory so that my game will not be killed by the OS when running, it will either start or not start. This is my first time to write a memory allocator, and it is not as sophisticated as "Ready, Set, Allocate!", but it just works fine enough for me.

In my little engine, a pool allocator is written for memory allocation, with different pre-defined pool size ranging from 8, 16, 32, 64bytes to 1048546bytes. As my target platform is iPhone, the maximum pool size is 1048546bytes which is used only for a few high resolution textures, most of the memory is spent on the the smaller pool size. During the program starts, a large chunk of memory is created and it is divided into different smaller chunks for different pool size as follows:
Notice that the large byte chuck is located in the smaller memory address for proper byte alignment. And within each chunk, it is divided into equally sized block for each particular size allocation:
The memory blocks within each chunk are maintained as a linked list so that when the pool memory allocator need to allocate/deallocate memory, it just need to return a free memory block from the list/add it back to the linked list. The memory within each memory blocks is used to store the 'next pointer' for the next free memory block in the linked list so that we do not need to allocate extra memory to keep track of the linked list (this approach is learnt from Game Engine Architecture):
For each allocation, the allocator need to decided which pool chunk need to be used depends on the size of the allocation. Then within that chunk, a free memory block is returned which is just the head of the linked list within that chunk. For each deallocation, as we divide the memory into different chunk, we know the boundary address of each pool chunk, so by checking the deallocated pointer address, we can determine which pool chunk it belongs to, then we can just add the memory back to that chunk's free memory block linked list. One drawback of this approach is it does not verify whether the deallocated pointer is actually allocated by the user nor it is double freed. To 'partly overcome' this problem, I added limited check to verify the input to each deallocation. First, I would check whether the input is byte aligned, for example if the deallocated pointer is within the 1048546 bytes chunk, then the pointer address must be 1048546 byte aligned. Second, as we partition the memory chunk, we know how many memory blocks is within each memory chunk, we can maintain a current free blocks number which will increase and decrease for each allocation and deallocation. When the program exits and this free block number does not match with the total number of blocks, then memory may either be leaked or double freed. But this only solve the problem partially.

To actually solve the problem and also check for memory leaks. I need to log every allocation and deallocation. Originally, for each allocation, I just store the returned pointer address and location of each allocation(which source file and line number) using the macro __FILE__, __LINE__. But this does not do well enough to track down all memory leak as some files are templated such as btAlignedAllocator.h in the bullet physics library(yes, I use bullet physics in my engine). Using the macro __FILE__, __LINE__ will only log down the allocation in these header file which does not help much for tracking memory leak. Therefore, I also log the callstack of each allocation using the system call backtrace() and backtrace_symbols()(which is available in Unix-like platform). Then I can track down all memory leak easily. However, logging every allocation is a very slow process and it is only enabled in debug build/ enabled when necessary.

In conclusion, my memory allocator still have different things to improve such as verifying the user deallocation; adding some meta-data within each memory block such as the allocation size; And for the thread safety, currently I only use a mutex to protected the memory, I may switch to a lock-free version in the future. Despite these short comings, this allocator works well enough for me as it avoid receiving memory warning from the iOS, avoiding memory fragmentation and help me track down memory leaks.

Reference:
[1] http://g.oswego.edu/dl/html/malloc.html
[2] Ready, Set, Allocate!: http://altdevblogaday.com/2011/04/11/ready-set-allocate-part-1/
[3] Game Engine Architecture: http://www.gameenginebook.com/
[4] http://gcc.gnu.org/onlinedocs/cpp/Standard-Predefined-Macros.html
[5] http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man3/backtrace_symbols.3.html

Writing an iPhone Game Engine (Part 0- Introduction)

This time I would like to talk about my hobby project which is writing an iPhone Game Engine. This project started from August last year and works with 2 artists. Although the game is still not finished, I want to share what I have learnt so far. Let's show some screen shots first:

Fighting against Ships

Exploring the game world
In the game, the player will control a ship to explore the world, discovering new cities and fighting with other ships. The player can also change the ships when the game progress.
I will have several blog posts to talk about the techniques I used in my little engine. The up coming topics will includes:
    - Memory management
    - Tools (Maya plugin and level editor)
    - Scripting (Lua)
    - Streaming system
    - Audio (OpenAL for effect sounds and Apple audio queue for BGM)
    - Performance tuning
After finishing the above topics I will talk about what I have done right and what have done wrong in this project.
In my next post, I will start to talk about the Memory management in my game engine. To end up this introductory post, I would like to show you more screen shots of the game and tools I have developed so far:
Profiling the game
Editor to compose a game entity

Level editor

Mac version of the game for artists to preview the game

SSAO using Line Integrals


Hi everyone, this is my first post in #AltDevBlogADay. Let me introduce myself first, I am Simon Yeung, currently working as a game programmer. I like graphics programming and sometimes write iPhone apps.
This time, I would like to talk about the SSAO implemented in my little demo program. I write this demo because I spent most of my time using openGL and know little about DirectX, so I decided to learn DX by writing this demo. So it is not well optimized.
SSAO, short for Screen Space Ambient Occlusion, is a technique for approximating the indirect shadow casted by surrounding scene geometry which is done in screen space by sampling from the depth buffer.
The SSAO is implemented using the line integrals from "Rendering techniques in Toy Story 3"[1]. Here is my results:
With SSAO

without SSAO

SSAO texture

Their method calculates the volume occluded by other objects inside a sphere at each fragment by sampling from the depth buffer.
From Slide 22 of the paper
The volume of sphere is found by using the equation:
From Slide 51 of the paper
And they use the Voronoi Diagram to associate the ratio of volume occupied by each sample points for their predefined sampling pattern.
But in my implementation, I didn't use the Voronoi Diagram, in stead, I tried to calculate the volume occupied by the depth sample using that equation in the pixel shader. However , due to the perspective projection, that equation no longer holds as the ray will not form a right angle triangle which is not the same as above figure, and resulting the artifact as below(the wall on the right side):
The artifact on the wall on the right side
So, I tried to solve the problem by using ray-sphere intersection to calculate a more accurate line integrals.
For example, when calculate the occlusion volume for the black cross in the above diagram (take 2 depth samples for easy explanation), I need to compute the length L1 and L2 by solving ray-sphere intersection. Also the length O1 and O2 can be computed by sampling from the depth buffer. Therefore, the volume of the sphere can be approximated by L1+L2 and the occlusion volume can be approximated by O1+O2. (I also added a distance attenuation factor to O1 and O2 if the depth difference is too large so that the tank does not occlude the wall in my demo program). And the AO value will be (O1+O2)/(L1+L2)
This eliminate the artifacts:
Solving ray-sphere intersection to eliminate the artifacts
The demo program uses 8 depth samples for each fragment. In order to fake a higher sample count, I also tried to rotate the sample points as suggested by the paper which gives a softer look for the AO:
Rotating the sample points
Then, a bilateral blur is applied to smooth out the noise. Although bilateral blur is not separable, it is faster to divided it into 2 passes (i.e. 1 horizontal and 1 vertical, just like Gaussian blur), with 5 samples for each pass, which gives a softer result:
After bilateral blur
Finally the SSAO texture is blend with the scene:
Applying SSAO to the scene
In conclusion, I finished the SSAO but it is not optimized, there are several places can be improved such as when calculating the line integrals, I made several branches in pixel shader which slow down a lot. Also I rotated the sample points by calculate a rotation angle using the fragment position in pixel shader which can also be optimized using a pre-computed rotation angle texture as the paper suggested. These things can be improved but my main purpose of this demo is make me familiar with DirectX, so I just left out the optimization and left as future enhancement.

References:
[1]: Rendering techniques in Toy Story 3, http://advances.realtimerendering.com/s2010/index.html
[2]: The brick texture is obtained from Crytek's Sponza Model: http://crytek.com/cryengine/cryengine3/downloads
[3]: The tank model is obtained from an XNA demo project:http://create.msdn.com/en-US/education/catalog/?contenttype=0&devarea=0&platform=21&sort=1

Light Pre Pass Renderer

This is my first attempt to write a DirectX 9 program (I usually write program in OpenGL). I decided to implement a Light Pre-Pass renderer[1], my purpose of this program is to give a try on DX9, and try out different techniques, so the program is not optimized.
Here is the result:

The G-buffer is rendered in the first pass, with buffer layout:
1. Depth (32bits) + (32bits unused)
2. View Space Normal (16bits * 3) + glossiness(16bits)


Then in the second pass, light buffer is calculated for all lights by sampling from the depth buffer and normal buffer using Blinn-Phong Shading. with the following layout:

R channel :  ∑(L.N) Ir
G channel :  ∑(L.N) Ig
B channel :  ∑(L.N) Ib
A channel :  ∑(N.H)^ glossiness
                    , where L is light vector,
                        N is the normal from the G-buffer,
                        H is the half vector,
                        I is the light color,
                        glossiness is the specular power from the G-buffer

Finally, the third pass render the geometry to compose the albedo color with the light buffer.

This screen shot show the depth buffer, normal buffer, the light buffer and the SSAO, rendered with 25 point lights. Currently, no ambient color is used.
In the next post, I will talk about my SSAO implementation.

[1]: http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html
[2]: The brick texture is obtained from Crytek's Sponza Model: http://crytek.com/cryengine/cryengine3/downloads
[3]: The tank model is obtained from an XNA demo project:http://create.msdn.com/en-US/education/catalog/?contenttype=0&devarea=0&platform=21&sort=1