Reflection and Serialization

Introduction
Reflection and serialization is a convenient way to save/load data. After reading "The Future of Scene Description on 'God of War'",  I decided to try to write something called "Compile-time Type Information" described in the presentation (but a much more simple one with less functions). All my need is something to save/load C style struct (something like D3D DESC structure, e.g. D3D12_SHADER_RESOURCE_VIEW_DESC) in my toy engine.

Reflection
A reflection system is needed to describe how struct are defined before writing a serialization system. This site has many information about this topic. I use a similar approach to describe the C struct data with some macro. We define the following 2 data types to describe all possible struct that need to be reflected/serialized in my toy engine (with some variables omitted for easier understanding).



As you can guess from their names, TypeInfo is used to described the C struct that need to be reflected/serialized. And TypeInfoMember is responsible for describing the member variables inside the struct. We can use some macro tricks to reflect a struct(more can be found in the reference):

struct reflection example
The above example reflect 3 variables inside struct vec3: x, y, z. The tricks of those macro is to use sizeof(), alignof(), offsetof() and using keyword. The sample implementation can be found below:



This approach has one disadvantage that we cannot use bit field to specify how many bits are used in a variable. And bit field order seems to be compiler dependent. So I just don't use it for the struct that need to be reflected.

It also has another disadvantage that it is error-prone to reflect each variable manually. So I have written a C struct header parser (using Flex & Bison) to generate the reflection source code. So, for those C struct file that need to auto generate the reflection data, instead of naming the source code file with extension .h, we need to name it with another file extension (e.g. .hds) and using visual studio custom MSBuild file to execute my header parser. To make visual studio to get syntax high light for this custom file type, We need to associate this file extension with C/C syntax by navigate to
"Tools" -> "Options" -> "Text Editor" -> "File Extension"
and add the appropriate association:


But one thing I cannot figure out is the auto-complete when typing "#include" for custom file extension, looks like visual studio only filtered for a couple of extensions (e.g. .h, .inl, ...) and cannot recognize my new file type... If someone knows how to do it, please leave a comment below. Thank you.
MSVC auto-complete filter for .h file only and cannot discover the new type .hds

Serialization
With the reflection data available, we know how large a struct is, how many variables and their byte offset from the start of the struct, so we can serialize our C struct data. We define the serialization format with data header and a number of data chunks as following figure:

Memory layout of a serialized struct

Data Header
The data header contains all the TypeInfo used in the struct that get serialized, as well as the architecture information(i.e. x86 or x64). During de-serialization, we can compare the runtime TypeInfo against the serialized TypeInfo to check whether the struct has any layout/type change (To speed up the comparison, we generate a hash value for every TypeInfo by using the file content that defining the struct). If layout/type change is detected, we de-serialize the struct variables one by one (and may perform the data conversion if necessary, e.g. int to float), otherwise, we de-serialize the data in chunks.

Data Chunk
The value of C struct are stored in data chunks. There are 6 types of data chunks: RawBytes, size_t, String, Struct, PointerSimple, PointerComplex. There are 2 reasons to divide the chunk into different types: First, we want to support the serialized data to be used on different architecture (e.g. serialized on x86, de-serialized on x64) where some data type have different size depends on architecture(e.g. size_t, pointers). Second, we want to support serializing pointers(with some restriction). Below is a simple C struct that illustrate how the data are divided into chunks:

This Sample struct get serialized into 3 data chunks

RawBytes chunk
RawBytes chunk is a chunk that contains a group of values where the size of those variables are architecture independent. Refer to the above Sample struct, the variables val_int and val_float are grouped into a single RawBytes chunk so that during run time, those values can be de-serialized by a single call to memcpy().

size_t chunk
size_t chunk is a chunk that contains a single size_t value, which get serialized as a 64 bit integer value to avoid data loss. But loading a too large value on x86 architecture will cause a warning. Usually this type will not be used, I just add it in case I need to serialize this type for third party library.

String chunk
String chunk is used for storing the string value of char*,  the serializer can determine the length the string (looking for '\0') and serialize appropriately.

Struct chunk
Struct chunk is used when we serialize a struct that contains another struct which have some architecture dependent variables. With this chunk type, we can serialize/de-serialize recursively.
The ComplexSample struct contains a Complex struct that has some architecture dependent values,
which cannot be collapsed into a RawBytes chunk, so it get serialized as a struct chunk instead.

PointerSimple chunk
PointerSimple chunk is storing a pointer variable. And the size of the data referenced by this pointer does not depend on architecture and can be de-serialized by a single memcpy() similar to the RawBytes chunk. To determine the length of a pointer (sometimes pointer can be used like an array), my C struct header parser recognizes some special macro which define the length of the pointer (and this macro will be expanded to nothing when parsed by normal Visual Studio C/C++ compiler). Usually during serialization, the length of the pointer depends on another variable within the same struct, so with the special macro, we can define the length of the pointer like below:

The DESC_ARRAY_SIZE() macro tells the serializer that
the size depends on the variable num within the same struct

When serializing the above struct, the serializer will look up the value of the variable num to determine the length of the pointer variable data, so that we know how bytes are needed to be serialized for data.

But using this macro is not enough to cover all my use case, for example when serializing D3D12_SUBRESOURCE_DATA for a 3D texture, the pData variable length cannot be simply calculated by RowPitch and SlicePitch:

A sample struct to serialize a 3D texture, which the length of
D3D12_SUBRESOURCE_DATA::pData depends on the depth of the resources

The length can only be determined when having access to the struct Texture3DDesc, which have the depth information. To tackle this, my serializer can register custom pointer length calculation callback (e.g. register for the D3D12_SUBRESOURCE_DATA::pData variable inside Texture3DDesc struct). The serializer will keep track of a stack of struct type that is currently serializing, so that the callback can be trigger appropriately.

Finally, if a pointer variable does not have any length macro nor registered length calcuation callback, we assume the pointer have a length of 1 (or 0 if nullptr).

PointerComplex chunk
PointerComplex chunk is for storing pointer variable, with the data being referenced is platform dependent, similar to the struct chunk type. It has the same pointer length calculation method as PointerSimple chunk type.

Serialize union
We can also serialize struct with union values that depends on another integer/enum variable, similar to D3D12_SHADER_RESOURCE_VIEW_DESC. We utilize the same macro approach used for pointer length calculation. For example:
A sample to serialize variables inside union
In the above example, the DESC_UNION() macro add information about when the variable need to be serialized. During serialization, we check the value of variable type, if type == ValType::Double, we serialize val_double, else if type == ValType::Integer, we serialize val_integer.

Conclusion
This post have described how a simple reflection system for C struct is implemented, which is a macro based approach, assisted with code generator. Based on the the reflection data, we can implement a serialization system to save/load the C struct using compile time type information. This system is simple, but it does not support complicated features like C++ class inheritance. And it is mainly for serializing C style struct, which is enough for my current need.

References
[1] https://preshing.com/20180116/a-primitive-reflection-system-in-cpp-part-1/
[2] https://www.gdcvault.com/play/1026345/The-Future-of-Scene-Description
[3] https://blog.molecular-matters.com/2015/12/11/getting-the-type-of-a-template-argument-as-string-without-rtti/


沒有留言:

發佈留言