Render Graph

Introduction
Render graph is a directed acyclic graph that can be used to specify the dependency of render passes. It is a convenient way to manage the rendering especially when using low level API such as D3D12. There are many great resources talked about it such as this and this. In this post I will talk about how the render graph is set up, render pass reordering as well as resources barrier management.

Render Graph set up
To have a simplified view of render graph, we can treat each node inside a graph as single render pass. For example we can have a graph for a simple deferred renderer like this:
Render passes dependency within a render graph
By having such graph, we can derive the dependency of the render passes, remove unused render pass, as well as reorder them. In my toy graphics engine, I use a simple scheme to reorder render passes. Taking the below render graph as an example, the render pass are added as following order:
A render graph example
We can group it into several dependency levels like this:
split render passes into several dependency levels
Within each level, the passes are independent and can be reordered freely, so the render passes are enqueued into command list as the following order:
Reordered render passes
Between each dependency level, we batch resources barrier to transit the resources to the correct state.

Transient Resources
The above view is just a simplified view of the graph. In fact, each render pass consist of a number of inputs and outputs. Every input/output is a graphics resource (e.g. texture). And render passes are connected through such resources within a render graph.
Render graph connecting render passes and resources
As you can see in the above example, there are many transient resources used (e.g. depth buffer, shadow map, etc). We handle such transient resources by using a texture pool. Texture will be reused after it is no longer need by previous pass (placed resources is not used for simplicity). When building a render graph, we compute the life time of every transient resources (i.e. the dependency level that the resource start/end). So we can free the transient resources when the execution go beyond a dependency level and reuse them for later render pass. So to specify a render pass input/output in my engine, I only need to specify their size/format and don't need to worry about the resources creation and the transient resources pool will create the textures as well as the required resources view (e.g. SRV/DSV/RTV).

Conclusion
In this post, I have described how render passes are reordered inside render graph, when barrier are inserted and transient resources handling. But I have not implemented parallel recording of command lists and async compute. It really takes much more effort to use D3D12 than D3D11. I think the current state of my hobby graphics engine is good enough to use. Looks like I can start learning DXR after spending lots of effort on basic D3D12 set up code. =]

D3D12 Constant Buffer Management

Introduction
In D3D12, it does not have an explicit constant buffer API object (unlike D3D11). All we have in D3D12 is ID3D12Resource which need to be sub-divided into smaller region with Constant Buffer View. And it is our job to handle the constant buffer life time and avoid updating constant buffer value while the GPU is still using it. This post will describe how I handle this topic.

Constant buffer pool
We allocate a large ID3D12Resource and treat it as an object pool by sub-dividing it into many small constant buffers (Let's call it constant buffer pool). Since constant buffer required to be 256 bytes aligned (I can only find this requirement in previous documentation, while the updated document only have such requirement in the Uploading Texture Data Through Buffers, which is under a section about texture...), so I defined 3 fixed size pools 256/512/1024 bytes pool. Only this 3 size type is enough for my need as most constant buffers are small (In Seal Guardian, the largest constant buffer size is 560 bytes, while large data like skinning matrix palette is uploaded via texture).
3 constant buffer pools with different size
In last post, a non shader visible descriptor heap manager is used to handle non shader visible descriptors. But in fact, that is only used for SRV/DSV/RTV descriptors. Constant buffer view are managed with another scheme. As described above, when we create a ID3D12Resource for constant buffer pool, we also create a non shader visible ID3D12DescriptorHeap with size large enough to have descriptors point to all the constant buffers inside the constant buffer pool.
ID3D12Resource and ID3D12DescriptorHeap are created in pair
We also split constant buffer pool based on their usage: static/dynamic. So there are total 6 constant buffer pools inside my toy engine (static 256/512/1024 bytes pool + dynamic 256/512/1024 bytes pool).

Dynamic constant buffer
Constant buffer can be updated dynamically. Each constant buffer contains a CPU side copy of their constant values. When they are binded before a draw call, those values will be copied to the dynamic constant buffer pool (created in upload heap). A piece of memory for constant buffer values will be allocated from the constant buffer pool in a ring buffer fashion. If the pool is full (i.e. ring buffer wrap around too fast where all the constant buffers are still in use by GPU), we will create a larger pool and the existing pool will be deleted after all related GPU commands finish execution.
Resizing dynamic constant buffer pool, the previous pool
will be deleted after executing related GPU commands
To avoid copying the same constant buffer values to the constant buffer pool when having multiple binding constant buffer calls. We keep 2 integer values for every dynamic constant buffer: "last upload frame index" and "value version". The last upload frame index is the frame index that those CPU constant buffer values get copied to the dynamic pool. The value version is an integer which is monotonic increased every-time the constant buffer value get modified/updated. So by checking this 2 integers, we can avoid duplicated copies of constant buffer in dynamic pool and re-use the previous copied values.

Static constant buffer
The static constant buffer will have a static descriptor handle described in last post. The static constant buffer pool is created in the default heap. The pool is managed in a free-list fashion as oppose to ring buffer in dynamic pool. Also when the pool is full, we still create extra pool for new constant buffer allocation request. But different from dynamic pool, previous pool will not be deleted when new pool get created.
Creating more static constant buffer pool if existing pools are full
To upload static constant buffer values to the GPU(since static pools are created in default heap), we use the dynamic constant buffer pool instead of creating another new upload heap. Every frame, we gather all newly created static constant buffers, then before we start rendering in this frame, we copy all the CPU constant buffer values to the dynamic constant buffer pool and then schedule a ID3D12GraphicsCommandList::CopyBufferRegion() call to copy those values from upload heap to default heap. By grouping all the static constant buffer uploads, we can reduce the number of D3D12_RESOURCE_BARRIER to transit between the D3D12_RESOURCE_STATE_COPY_DEST and D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER states.

Conclusion
In this post, I have described how constant buffers are managed in my toy engine. It use a number of different pool size which is managed in ring buffer fashion for dynamic constant buffers and in free-list fashion for static constant buffers. Uploading of static constant buffer content are grouped together to reduce barrier usage. However, I only split the usage based simply on static/dynamic, I would like to investigate the performance in the future whether adding another usage type for some use case like constant buffer will be updated every frame, and used frequently in many draw calls (e.g. write once, read many within a frame) and would like to place those resources on the default heap instead of the current dynamic upload heap.

Reference
[1] https://docs.microsoft.com/en-us/windows/desktop/direct3d12/large-buffers
[2] https://www.gamedev.net/forums/topic/679285-d3d12-how-to-correctly-update-constant-buffers-in-different-scenarios/





D3D12 Descriptor Heap Management

Introduction
Continue with the last post, we described about how root signature is managed to bind resources. But root signature is just one part of resources binding, we also need to use descriptor to bind reousrces. Descriptors are small block of memory describing an object (CBV/SRV/UAV/Sampler) to GPU. They are stored in descriptor heaps, and they may be shader visible or non shader visible. In this post, I will talk about how descriptors are managed for resources binding in my toy graphics engine.

Non shader visible heap
Let's start with the non-shader visible heap management. We can treat a descriptor as a pointer to a GPU resource (e.g. texture). Descriptor heap is a piece of memory used for storing descriptors and the size of a single descriptor can be queried by ID3D12Device::GetDescriptorHandleIncrementSize(). So we treat descriptor heap as an object pool, and every descriptor within the same heap can be referenced by an index.
Non shader visible descriptor heap containing N descriptors
Since we don't know how many descriptors are needed in-advance and we may have many non shader visible heaps, A non shader visible heap manager is created for allocating a descriptor from descriptor heap(s). This manager contains at least 1 descriptor heap. When a descriptor allocation request is made to the manager, it will first look for free descriptor from existing descriptor heap(s), if none is found, a new descriptor heap will be created to handle the request.
Descriptor heap manager handles descriptor allocation request, create descriptor heap if necessary
So within the graphics engine, we use a "non shader visible descriptor handle" to reference a D3D12 descriptor which store the heap index and descriptor index with respect to a descriptor heap manager. All the textures created in the engine will have a "non shader visible descriptor handle" for resources binding (more on this later).

Shader visible heap
Next, we will talk about shader visible heap management. Shader visible heap is responsible for binding resources that get used in shaders. It is recommend that just only 1 heap is used for all frames so that asynchronous compute and graphics workload can be run in parallel(on NVidia hardware). So we just create 1 large shader visible heap at the start of program and don't bother to resize/allocate a lager heap when the heap is full (we just assert in this case). With a single large shader visible descriptor heap, it is divided into 2 regions: static / dynamic.
A single large shader visible descriptor heap, divided into 2 regions

Dynamic descriptor
Dynamic descriptors are used for some transient resources that their descriptor table cannot be reused often. During resources binding (e.g. texture), their non-shader visible descriptors will be copied to the shader visible heap via ID3D12Device::CopyDescriptors(), where the copy destination (i.e. dynamic shader visible descriptors) is allocated in a ring buffer fashion (Note the copy operation have a restriction that the copy source must be in non-shader visible heap, that's why we allocate a "non shader visible descriptor handle" for every texture).

Static descriptor
Static descriptors are used for resources which can be grouped together into a descriptor table, so that they can be reused over multiple frames. For example, a set of textures inside a material will not be changed very often, those textures can be grouped into a descriptor table. My current approach is to use a "stack" based approach to manage the static shader visible descriptor heap. Instead of creating a stack of individual descriptor, we have a stack of groups of descriptors, often, during level load, 1 static descriptor group will be created.
static descriptors are packed into group during level load
Inside a group of static descriptors, the descriptors are sorted such that all constant buffer descriptors appear before texture descriptors. Also null descriptors may need to be added to respect the Hardware Tiers restriction. To identify a static descriptor in shader visible heap, we can use the stack group index together with the descriptor index within the group.
descriptor are ordered by type, with necessary padding
Each "static resource"(e.g. constant buffer/texture) will have a "static descriptor handle" beside the "non shader visible descriptor handle". We can check whether those resources are within the same descriptor table by comparing the stack group index and descriptor index to see whether they are in consecutive order. With such information, we can create a resource binding API similar to D3D11 (e.g. ID3D11DeviceContext::PSSetShaderResources() ), if all the resources in the API call are in the same descriptor table, we can use the static descriptor to bind the descriptor table directly, otherwise, we switch to use the dynamic descriptor approach described in previous section to create a continuous descriptor table. (I have also think of instead of using similar binding API as D3D11, may be I can create a so call "descriptor table" object explicitly, say during material loading and grouping material textures into a descriptor table, so that resources binding can skip the consecutive descriptor index check described above. But currently I just stick with a simple solution first...)

As mentioned before, the static descriptor group is allocated based on a "stack" based approach. But my current implement is not strictly "last in - first out", we can removing a group in between and make some "hole" in the static shader visible heap region, but this will result in fragmentation.

Fragmented static descriptor heap region
In theory, we can defragment this heap region by moving descriptor groups to un-used space (it works as we use index to reference descriptors inside a heap instead of address directly) and during defragmentation, we may switch to use dynamic descriptors temporarily to avoid overwriting the static heap region while the GPU commands are still using it. But currently, I have not implemented the defragmentation yet because I only get one simple level (i.e. only 1 static descriptor group) now...

Conclusion
In this post, I have described how the descriptor heap is managed for resources binding. To sum up, the shader visible descriptor heap is divided into 2 regions: static/dynamic. Static descriptor heap is managed in a "stack" based approach. During level loading, all the static CBV/SRV descriptors are stored within a static descriptor stack group, which is a big continuous descriptor table. This will increase the chance to reuse the descriptor table. In addition to this optional static descriptor, every resources must have a non-shader visible descriptor handle. This non-shader visible descriptor handle is used when a static descriptor table cannot be used during resource binding, and it will get copied to the shader visible heap to form a new descriptor table. With this kind of heap management, we can create a resources binding API similar to D3D11, which call the underlying D3D12 API using descriptors.

References
[1] https://docs.microsoft.com/en-us/windows/desktop/direct3d12/resource-binding
[2] https://www.gamedev.net/forums/topic/686440-d3d12-descriptor-heap-strategies/