D3D12 Root Signature Management

Continue with the last post about writing my new toy D3D12 graphics engine, we have compiled some shaders and extracted some reflection data from shader source. The next problem is to bind resources(e.g. constant buffer / textures) to the shaders. D3D12 use root signatures together with root parameters to achieve this task. In this post, I will describe how my toy engine create root signatures automatically based on shader resources usage.

Left: new D3D12 graphics engine (with only basic diffuse material)
Right: previous D3D11 rendering (with PBR material, GI...)
Still a long way to go to catch up with the previous renderer... 

Resource binding model
In D3D12, shader resource binding relies on the root parameter index. But when iterating on shader code, we may modify some resources binding(e.g. add a texture variable / remove a constant buffer), the root signature may be changed, which cause the change of root parameter index. This will need to update all function call like SetGraphicsRootDescriptorTable() with new root parameter index, which is tedious and error-prone... Compare to the resource binding model in D3D11 (e.g. PSSetShaderResources()PSSetConstantBuffers()), it doesn't have such problem as the API defined a set of fixed slots to bind with. So I would prefer to work with a similar binding model in my toy engine.

So, I defined a couple of slots for resource binding as follow (which is a bit different than D3D11):
Instead of having a fixed slot per shader stage in D3D11, my toy engine fixed slots can be summarized into 3 categories as:
Resource binding slot categories

Slot category "Resource Type"
As described by its name (CBV/SRV/UAV), this slot is used to bind the corresponding resource type like constant buffer / shader resource view / unordered access view.
For SRV type, it further sub-divide into VS_ONLY / PS_ONLY / ALL sub-categories which refer to the shader visibility. According to Nvidia Do's An Don'ts, limiting the shader visibility will improve the performance.
For CBV type, the shader visibility will be deduced from shader reflection data during root signature and PSO creation.

Slot category "Change frequency"
Resources are encouraged to be bound based on their update frequency. So this slot category are divided into 3 types: Per Frame/ Per View / Per Draw.
For the Per Frame/View types, they will have a root parameter type as descriptor table.
While the Per Draw CBV type will have root parameter type as root descriptor.
For Per Draw SRV type, it still uses descriptor table instead of root descriptor, because for example, it is common to have only 1 constant buffer for material of a mesh while binding multiple textures for the same material. So using descriptor table instead will help to keep the size of root signature small.

Slot category "Usage"
This category is used for sub-dividing into different usage patterns: Engine/Shader.
For Engine usage, it will typically be binding stuff like mesh transform constant, camera transform, etc.
For Shader usage, it is used for something like shader specific stuff, e.g. material constant.
I just can't find the appropriate name for this category, and simply use the name as Engine/Shader. May be it is better to call them Group 0/1/2/3... in case I may have different usage patterns in the future. But currently I just don't bother with it now...

Shader Reflection
In last post, I have mentioned that during shader compilation, shader reflection data is exported. This is important for the root signature creation. From these reflection data, we can know which constant buffer/texture slots get used. When creating a pipeline state object(PSO) from shaders, we can deduce all the resources slots get used in PSO (as well as the shader visibility for constant buffer) and then create an appropriate root signature with each resource slot mapped to the corresponding root parameter index (let's call this mapping data as "root signature info").

To specify the resource slot in shader code, we make use of the register space introduced in Shader Model 5.1. We can define which slot is used for constant buffer/texture. For example:
#define ENGINE_PER_DRAW_SRV_ALL space5 // all shaders must have the same slot-space definition
Texture2D shadowMap : register(t0, ENGINE_PER_DRAW_SRV_ALL);
With the above information, on the CPU side code, we can bind a resource to a specific slot using the root parameter index stored inside the "root signature info" similar to D3D11 API.

In this post, we have described how root signature can be automatically created and used for a slot based API. First root signature are created(or re-used/shared) during the creation of pipeline state object(PSO) based on its shader reflection data. We also create a "root signature info" to store the mapping between resource slots and root parameter index together with the root signature and PSO. Then we can use this "root signature info" to bind the resources to the shader.

As this is my first time to write a graphics engine with D3D12. I am not sure whether this resource binding model is the best. I have also think of other naming scheme for the resource slots: instead of naming with PerDraw / PerView type, is it better to name it explicitly with RootDescriptor / DescriptorTable instead? May be I will change my mind after I gained more experience in the future...

MSBuild custom build tools notes

Recently, I am trying to re-write my graphics code to use D3D12 (instead of D3D11 in Seal Guardian). I need to have a convenient way to compile shader code. While tidying up the MSBuild custom build steps files used in Seal Guardian for my new toy graphics engine, I regret that I did not write a blog post about custom MSBuild at that time, as I remembered, finding such information was hard at that time and I need to look at some of the CUDA custom build files to guess how it works. So this post will just be my personal notes about custom MSBuild and I don't guarantee all information about MSBuild are 100% correct. I have uploaded an example project to compile shader files here. Interested readers may also check out this excellent post about MSBuild written by Nathan Reed.

Custom build steps set up
MSBuild need to have .targets file to describe how the compiler (e.g. dxc/fxc used for shader compilation) are invoked. In the uploaded example project, we have 3 main targets: DXC, JSON, BIN.

- DXC target: described by its name, invoking the dxc.exe to compile HLSL file.
- JSON target: used to invoke the shaderCompiler.exe, which is our internal tool written using Flex & Bison to parse the shader source code to output some meta data, like texture/constant buffer usage for root signature management.
- BIN target: a task that depends on DXC task and JSON task, invoke the dataBuilder.exe, our internal tool for data serialization/deserialization into our binary format, combining the output from DXC and JSON task.

target dependency

Although MSBuild can set up the target dependency, but it looks like those independent targets are not executed in parallel. In Seal Guardian, when compiling the surface shaders which generate the shader permutation for lighting, this result in a long compilation time. At the end, I need to create another exe to launch multiple threads to speed up the shader compilation. May be I was setting up MSBuild incorrectly, if anyone knows how to parallelize it, please let me know in the comment below. Thank you!

Incremental Builds
MSBuild use .tlog file to track files modification to avoid unnecessary compilation (also affect which files got deleted when cleaning the project). There are 2 tlog files (read.1.tlog and write.1.tlog), one is for tracking whether the source files are modified, and the other is tracking whether the output file is up to date. We can simply use the WriteLinesToFile task to mark such dependency, e.g.

<WriteLinesToFile File="$(TLogLocation)$(ProjectName).write.1.tlog" Lines="$(TLog_writelines)" />

But doing this only will make the tlog file larger and larger after every compilation. So it is better to read the tlog file content into a PropertyGroup and check whether the file already contains the text we would like to write using a "Conidition" inside the WriteLinesToFile task. For details, please take a look at the example project.

Also, as a side note, do not include $(MSBuildProjectFile) property in the "Inputs" element inside "Target" task. I did it accidentally and it cause the whole project to recompile all the shaders every time a new shader file is added to / removed from the project. This is not necessary as most of the shader files are independent.

Output files
Like every visual studio project, our example project have a Debug and Release configuration. After executing the BIN task described above, we also use a Copy task to copy our compiled shader from Debug/Release config $(OutDir) directory to our content directory. We can also use the Property Function MakeRelative() to maintain the directory hierarchy in the output directory. This is another reason why I use Copy task instead of specifying the $(OutDir) to the content directory, as I cannot get a nested Property Function working inside the .props file (or may be I did something wrong? I don't know...)...

Also, beside output files, another note is about the output log. If we want to write something to the output console of visual studio from your custom exe (e.g. dataBuilder.exe/dataBuilder.exe in the example project), the text must be in a specific format like (but I cannot find the documentation of the exact format, just guess from similar message emitted from visual studio...):

1>C:/shaderFileName.hlsl(123): error : error message

otherwise, those message will not get displayed in output window.

Example project
An example project is uploaded here. It will compile vertex/pixel shaders with dxc.exe and output JSON meta data to the $(IntDir), then combine those data and write to the $(OutDir). Finally those files will be copied to the asset directory with the corresponding relative path to the source directory. Please note that the shaderCompiler.exe used for outputting meta-data is for internal tools, which have some custom restriction on the HLSL grammar for my convenience to create root signature. It is used just as an example to illustrate how to set up a custom MSBuild tool, feel free to replace/modify those files to suit your own need. Thank you.

[2] http://www.reedbeta.com/blog/custom-toolchain-with-msbuild/

Testing Hot-Reload DLL on Windows

After finishing the game Seal Guardian and taking some rest, I was recently back to refactoring the engine code of the game Seal Guardian. In this game, the engine has the ability to hot-reload all asset type from texture, shader, game level to Lua script. But it lacks the ability to hot-reload C/C++ files. So I decided to spend some time on finding resources about hot reload C/C++. It turns out hot-reload C/C++ is not that trivial on Windows as the PDB file is locked. And I found this approach of patching the PDB path inside DLL looks interesting. So I gave it a try and the sample program is uploaded to  here (only tested with Visual Studio Community 2017).

First try
Because the PDB file path is hard coded inside the DLL file, the approach used by cr.h is to correctly parse the DLL file to find the PDB file path and replace it with another new file path according to the Portable Executable format.

So I tried something similar, but different from cr.h, instead of generate a new DLL/PDB file name every time the DLL get re-compiled, I use a fixed temporary name (I don't want to have many random files inside the binary directory after several hot reload...) For example, when Visual Studio generate files:
  • abc.dll
  • abc.pdb
The sample program will detect abc.dll is updated, it will generate 2 new files:
  • ab_.dll
  • ab_.pdb
Where ab_.dll will have a patched PDB path pointing to the newly copied ab_.pdb. And the program will load the ab_.dll instead.

The reason I don't choose a more meaningful name like abc_tmp.dll is because I worry that having a longer file name length than the original name may mess up the offset values stored inside the DLL. So I just replace the last character with an underscore character.

This approach works and every time I start debug without debugger by pressing Ctrl+F5 in Visual Studio, and then edit some code and re-build solution by pressing F7, the DLL get hot-reloaded. When the sample program exit, the ab_.dll and ab_.pdb files get deleted.

However, when the program quit with a debugger attached, the program can't delete the ab_.pdb file...

Second try
We know that the Visual Studio debugger is locking the PDB file, what if when we detect a debugger is attached, can we detach the debugger programmatically before the program exit? Luckily the EnvDTE COM library can help with this task and someone has written sample code to do this (Although that sample code said we need to modify the "VisualStudio.DTE" string to your installed version like "VisualStudio.DTE.14.0", but I have tested with Visual Studio Community 2017 and it works without modification). So, by detaching the debugger programmatically, we can delete the temporary PDB file when program exit.

Third try
Now we can detach debugger programmatically, Why not try re-attach the debugger after every hot re-load? With the re-attach debugger code written, I tried running the program by pressing F5(Start Debugging) and then pressing F7 to re-compile the solution. A dialog pop up:

And I happily press 'Yes' and hope the hot-reload works, do you know what happened? The debugger stopped, but the application also quit... Looks like this approach can only work when using Ctrl-F5(Start without debugger)... I searched for the web for how to disable killing the app when debugger stop, but I can only find people suggest to detach the debugger instead. So I work around this problem by detach the debugger and re-attach it during the program start to avoid the debugger to kill the app when it stop.

So, the hot-reload function is almost working now, just press F5 to start and F7+Enter to re-compile. But sometimes the debugger fail to re-attach to the reloaded app. After spending sometime to investigate the issue, it is due to EnvDTE::Process::Item() function may fail to find the reloaded app process, returning error code RPC_E_CALL_REJECTED. I don't know why this happens, may be the process is busy at reloading the new DLL, so the final work around is to wait a bit and let the process finish their work and re-try it several times.

Fourth try
We know that detaching the debugger will unlock the PDB, what if we just detach the debugger to unlock the PDB, and only copy the newly complied DLL without patching a new PDB path? Unfortunately, it fails and saying that .vcxproj file is locked...

So I can only revert back to use the "Third try" approach...

Last try
We finally have a workable approach to reload the DLL, how about the executable itself? So I tried the "edit and continue" function in Visual Studio. And it works! But only for once... It is because after edit and continue, stopping the debugger will make Visual Studio kill the app... When manually detach the debugger from Visual Studio, it fails with:

So, "edit and continue" function does not compatible with my hot-reload method which relies on detaching the debugger...

In this post, I have described the methods I tried when writing hot-reloadable DLL code on windows. The steps are as follow:

When the program loads a DLL:
1. Copy its associated PDB file.
2. Copy the target DLL file and modify the hard coded PDB path to newly copied PDB path done in step 1.
3. Load the copied DLL in step 2 instead.
After editing some code:
4. Detach the debugger to compile the DLL from Visual Studio.
5. Unload the copied DLL.
6. Repeat the above step 1 to 3.
7. Re-attach the debugger.
From a programmer perspective, steps are:
1. In Visual Studio, press F5 to compile and run the program with debugger.
2. Edit some code, then press F7 to re-build the solution.
3. Press enter to confirm the "Do you want to stop debugging?" dialog.
4. The program will reload the new DLL and re-attach the debugger automatically after compilation.
You can try the above work flow by downloading the sample code. I have only tested it with Visual Studio Community 2017 and may not work with other version of Visual Studio. This method is far from perfect, and if anyone knows a better method and don't require work around, please let me know. Thank you very much!

[1] https://github.com/RuntimeCompiledCPlusPlus/RuntimeCompiledCPlusPlus/wiki/Alternatives
[2] https://ourmachinery.com/post/dll-hot-reloading-in-theory-and-practice/
[3] https://ourmachinery.com/post/little-machines-working-together-part-2/
[4] https://blog.molecular-matters.com/2017/05/09/deleting-pdb-files-locked-by-visual-studio/
[5] https://github.com/fungos/cr
[6] http://www.debuginfo.com/articles/debuginfomatch.html
[7] https://msdn.microsoft.com/en-us/library/ms809762.aspx
[8] https://handmade.network/forums/wip/t/1479-sample_code_to_programmatically_attach_visual_studio_to_a_process