Tutorial: A practical example of Data-Oriented Design
For this tutorial, I’m going to assume a couple of things.
- You are familiar with C++
- You have at least a basic knowledge of OpenGL 2.0
- You know what object-oriented programming is
Bonus:
- You are familiar with game programming
- You are familiar with low-level optimizations
- You know what a cache is
Ready? Alright, here we go!
What is Data-Oriented Design?
Data-oriented design (DOD) is when you look at your game’s state as data transforming into other data. In object-oriented programming (OOP) you view things as objects, each holding their own state and interacting with the world. A small example.
Suppose this is our game world. We have four balls bouncing around a room. When the balls hit a wall they go into a different direction.
In OOP, we say each ball has a position, a direction and a speed. Each ball gets an Update function that updates the position and a Render function that renders the ball’s sprite at their current position. This is what the data looks like:
We have a Ball class with some members and we have a BallManager class that has a list of Ball instances. To update the balls, the BallManager calls each ball’s Update and Render function.
void Ball::Update(float a_DT) { m_Pos += m_Dir * m_Speed * a_DT; } void BallManager::Update(float a_DT) { for (unsigned int i = 0; i < m_BallTotal; i++) { m_Balls[i]->Update(a_DT); } }
So what’s the problem? Works fine right? Each ball has its own properties and you update those properties.
The main driving force behind data-oriented design is that you never tend to work with single objects, you work with groups of objects. So what would that look like?
Now, there is no Ball class. There is only the BallManager. And instead of a list of Ball instances, it has a list of ball properties. Now its Update function looks like this:
void BallManager::Update(float a_DT) { for (unsigned int i = 0; i < m_BallData.filled; i++) { m_BallData.pos[i] += m_BallData.dir[i] * m_BallData.speed[i] * a_DT; } }
The advantages are not obvious. It just looks like I moved some stuff around. How is that going to help?
A good way of trying to wrap your brain around data-oriented programming is to think of it as a factory: data comes in, an operation is performed and data comes out.
The Problem with Object-Oriented Programming
You might not realize there is a problem. But it’s been slowly creeping up on the industry. Computers, consoles and even phones nowadays have multiple cores. But our programs are traditionally written for one-cored computers. Yes, we can do multithreading. Yes, we have job systems. But our data is not inherently multithreadable. Each object is treated as its own entity, which makes it difficult to stream.
The inherent problem is that while our processors have been steadily increasing in speed (at about 60% per year), memory hasn’t kept up, increasing speed by only 10% per year. The gap is widening, and it is a problem. Keeping your data in cache (very fast but small memory near the processor) is becoming increasingly important. If you can keep all your data as near to the processor as possible, you can squeeze some significant performance from it.
As a game programmer in advanced training, I learned about the difference between an Array of Structures (AoS) and a Structure of Arrays (SoA). I never paid much attention to it, because it seemed like such a hassle. But that was because I wasn’t forced to think exclusively in AoS. When I did an internship intake for a large gaming company with tons of blackjack casinos games, they asked me to use data-oriented design. I found that there wasn’t much available on the subject. A handful of talks and a couple of blogposts. But no practical examples and no source code.
So I decided to write my own. :)
The Test Program
The program we’re going to discuss is a voxel renderer. Voxels are essentially three-dimensional pixels. You can build and shape worlds with them, but they are traditionally very expensive to render. That is because graphics cards aren’t optimized for square blocks, they’re optimized for triangles. On top of that, we’re going to throw a raytracer on top of it. Each voxel casts a ray from a target position to itself. If the ray hits a block other than itself it is not visible and culled.
There are two versions, one using object-oriented programming (OOP) and one using data-oriented design (DOD). I have set some restrictions for myself for this example:
- No macro-optimizations – The OOP and the DOD version both use the same algorithm.
- No micro-optimizations – The DOD version does not use SIMD functions, even though it totally could.
- No multi-threading – Mostly because I didn’t want to multithread the OOP version.
Controls:
- Left and Right – Rotate camera
- Spacebar – Switch between OOP and DOD
- G – Generate new voxels
- Q – Decrease amount of voxels
- W – Increase amount of voxels
- R – Turn raycasting on or off
- D – Turn debug lines on or off
Format:
The example was built with Microsoft Visual Studio 2008. A VS2008 solution is provided. It should be forward-compatible with VS2010. The code will not compile out of the box on Macintosh or Linux machines, because of Microsoft-specific features (Windows window, Microsoft extension for font rendering).
Download:
VoxelExample-1.0.zip (1.5 MB)
This example is provided without any license. That means that anyone can download, modify, redistribute and sell it or any part of it. This example is provided “as-is”, without any guarantee of bug fixes or future support.
The Basics
We’ll need a window with an OpenGL context. This is handled by Framework.h and Framework.cpp. It’s not that interesting overall so I won’t focus too much on it. Suffice to say it creates a window, handles to the game code and runs their Update and Render functions. It also does some global key handling.
Common.h contains includes and global defines. Included with this example is part of my Toolbox: TBSettings.h, TBMath.h and TBVec3.h. You are free to use these headers in your own projects, if you wish. There are no license requirements attached to them.
The two interesting classes are OOPVoxels and DODVoxels. They are both children of the Game class.
We create voxels with a very simple algorithm. It starts at position (0, 0, 0) in the world and for every voxel, it takes a step forward, backwards, left, right, up or down. The color is a random color between (0, 0, 0) and (1, 1, 1, 1). The result is a colorful, but rough, “snake” of voxels.
The Object-Oriented Approach
I’ll skip the part where I generate the voxels. It’s not really relevant for this example. If you want to look at it you can find it in OOPVoxels::GenerateVoxels.
We will need some kind of container for voxel data. We are going to store the voxels in a VBO buffer. Every voxel consists of 6 quads, so each voxel will need to add its sides to the vertex buffer. The size of the voxels doesn’t change, so we store position offsets for each of the coordinates in the game class.
// offsets float halfsize = m_VoxelSize / 2.f; m_VoxelOffset[0] = Vec3(-halfsize, -halfsize, halfsize); m_VoxelOffset[1] = Vec3( halfsize, -halfsize, halfsize); m_VoxelOffset[2] = Vec3(-halfsize, halfsize, halfsize); m_VoxelOffset[3] = Vec3( halfsize, halfsize, halfsize); m_VoxelOffset[4] = Vec3(-halfsize, -halfsize, -halfsize); m_VoxelOffset[5] = Vec3( halfsize, -halfsize, -halfsize); m_VoxelOffset[6] = Vec3(-halfsize, halfsize, -halfsize); m_VoxelOffset[7] = Vec3( halfsize, halfsize, -halfsize);
Next, we initialize our data and create a VBO to hold our voxels as a streaming quad and color buffer.
// voxel buffer glGenBuffers(1, &m_VoxelVBOVertex); CGLE(); glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOVertex); CGLE(); glBufferData(GL_ARRAY_BUFFER, m_VoxelDataSize * sizeof(Vec3), m_VoxelVertices, GL_STREAM_DRAW); CGLE(); glGenBuffers(1, &m_VoxelVBOColor); CGLE(); glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOColor); CGLE(); glBufferData(GL_ARRAY_BUFFER, m_VoxelDataSize * sizeof(Vec3), m_VoxelColors, GL_STREAM_DRAW); CGLE(); glBindBuffer(GL_ARRAY_BUFFER, 0); CGLE();
What our voxel needs to store for now is its position, a color and whether it is clipped.
class Voxel { Voxel(); ~Voxel(); void SetClipped(bool a_State) { m_Clipped = a_State; } bool IsClipped(); void SetColor(Vec3& a_Color) { m_Color = a_Color; } Vec3& GetPosition() { return m_Pos; } void SetPosition(Vec3& a_Position); private: Vec3 m_Pos; Vec3 m_Color; bool m_Clipped; };
In the PostTick function, we loop over every voxel and set its clipping state to false.
for (unsigned int i = 0; i < Framework::s_VoxelCurrentTotal; i++) { m_Voxels[i].SetClipped(false); }
And now for the slightly harder part. In the Render function, we want to lock the vertex and color buffer and add voxels that aren’t clipped to it. So we add an AddToRenderList function to the Voxel class. It takes a destination vertex buffer, a destination color buffer and the offsets for each vertex of the voxel.
Every voxel needs to check its clipping state before it can add itself to the buffer.
// write visible voxels to buffers glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOVertex); Vec3* write_vertices = (Vec3*)glMapBuffer(GL_ARRAY_BUFFER, GL_READ_WRITE); glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOColor); Vec3* write_color = (Vec3*)glMapBuffer(GL_ARRAY_BUFFER, GL_READ_WRITE); unsigned int total = 0; for (unsigned int i = 0; i < Framework::s_VoxelCurrentTotal; i++) { if (!m_Voxels[i].IsClipped()) { m_Voxels[i].AddToRenderList(write_vertices, write_color, m_VoxelOffset); write_vertices += 24; write_color += 24; total++; } } glUnmapBuffer(GL_ARRAY_BUFFER); CGLE(); glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOVertex); glUnmapBuffer(GL_ARRAY_BUFFER); CGLE(); glBindBuffer(GL_ARRAY_BUFFER, 0);
The Voxel::AddToRenderList function is very long, but most of it is copied. It writes its position plus an offset to the buffer and copies the same color to all six quads.
void OOPVoxels::Voxel::AddToRenderList(Vec3* a_Vertex, Vec3* a_Color, Vec3* a_Offset) { Vec3* dst_pos = a_Vertex; Vec3* dst_color = a_Color; Vec3* offset[] = { // Front &a_Offset[0], &a_Offset[1], &a_Offset[3], &a_Offset[2], // more offsets here }; Vec3** read_offset = offset; for (unsigned int j = 0; j < 24; j += 4) { dst_pos[0] = m_Pos + *read_offset[0]; dst_pos[1] = m_Pos + *read_offset[1]; dst_pos[2] = m_Pos + *read_offset[2]; dst_pos[3] = m_Pos + *read_offset[3]; dst_color[0] = m_Color; dst_color[1] = m_Color; dst_color[2] = m_Color; dst_color[3] = m_Color; dst_color += 4; dst_pos += 4; read_offset += 4; } }
We use the unsigned int total to figure out how many voxels are actually in the quad buffer. Then we render.
// draw buffers glEnableClientState(GL_VERTEX_ARRAY); CGLE(); glEnableClientState(GL_COLOR_ARRAY); CGLE(); glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOVertex); glVertexPointer(3, GL_FLOAT, 0, 0); CGLE(); glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOColor); glColorPointer(3, GL_FLOAT, 0, 0); CGLE(); glDrawArrays(GL_QUADS, 0, total * 4 * 6); CGLE(); glBindBuffer(GL_ARRAY_BUFFER, 0); CGLE(); glDisableClientState(GL_COLOR_ARRAY); CGLE(); glDisableClientState(GL_VERTEX_ARRAY); CGLE();
Alright, voxels on the screen!
Now, let’s focus on the raytracing portion. When raytracing is turned on, every voxel generates a ray that starts at the target position and points towards the voxel. This ray then stores the voxel instance that is closest to the target position. So, each ray checks each voxel to see which is the closest. Luckily one of my teachers had a very fast ray-AABB collision check.
bool OOPVoxels::Ray::CollidesWith(Voxel* a_Other) { float t0, t1; float tmin = 0.f; float tmax = 10000.f; t0 = (a_Other->GetMin().x - m_Origin.x) * m_Direction.x; t1 = (a_Other->GetMax().x - m_Origin.x) * m_Direction.x; if (t0 > t1) { tmin = (t1 > tmin) ? t1 : tmin; tmax = (t0 < tmax) ? t0 : tmax; } else { tmin = (t0 > tmin) ? t0 : tmin; tmax = (t1 < tmax) ? t1 : tmax; } // same for y and z axes if (tmin <= tmax && tmin <= m_TimeMin) { m_TimeMin = tmin; m_Hit = a_Other; return true; } return false; }
But still, every ray needs to check every voxel. This results in horrible performance!
The obvious optimization is to update the algorithm so we don’t need to check so many voxels. But this is supposed to be an example in data-oriented design. ;)
The Data-Oriented Approach
Instead of thinking in terms of voxels, what is the minimum amount of data we need per voxel? Well, they’re all the same size. So all we need is a position and a color. So what we have now is a container for all our voxels.
struct VoxelData { float* x; float* y; float* z; float* r; float* g; float* b; };
What next? Well, our voxel data needs to be transformed into triangles on the screen. But what is the data that the VBO’s need to put stuff on the screen? It’s a position and a color. So we need a container for the vertex data.
struct VertexData { Vec3* vertex; Vec3* color; };
As the final step, we need to transform voxel data into vertex data. We’re going to use a function for that acts like a factory. Here is the diagram:
The internal structure of the function is much the same as the OOP version. However, I have split the saving of position and color data to improve caching. You can find it in DODVoxels::GenerateFaces.
So, what needs to change when we want to add raycasting? Obviously we’ll need a structure to hold our ray data.
struct RayCastData { float* ox; float* oy; float* oz; float* dx; float* dy; float* dz; };
And we’ll need a function that takes voxel data and generates ray data.
void GenerateRayCast( RayCastData* a_OutRays, unsigned int& a_OutRayCount, VoxelData* a_InVoxels, unsigned int a_VoxelCount, Vec3 a_InTarget );
Finally, we’ll need a function that converts ray data into vertex data.
void SolveRayCast( VoxelData* a_OutVoxels, unsigned int& a_OutVoxelCount, RayCastData* a_InRays, unsigned int a_InRayCount, VoxelData* a_InVoxels, unsigned int a_InVoxelCount );
This is, in my opinion, the biggest advantage of DOD over OOP. It’s clear where your data is and in what way it needs to be converted. The DODVoxels::SolveRayCast function is the biggest change from the OOP version. It has become massive. But the algorithm is the same.
Now, putting it all together:
void DODVoxels::PostTick() { if (Framework::s_RayCast) { Vec3 target( Math::CosDeg(Framework::s_TargetRotation) * 500.f, 10.f, Math::SinDeg(Framework::s_TargetRotation) * 500.f ); GenerateRayCast( m_VoxelRayCast, m_VoxelRayCastTotal, m_VoxelData, Framework::s_VoxelCurrentTotal, target ); SolveRayCast( m_VoxelRenderData, m_VoxelRenderTotal, m_VoxelRayCast, m_VoxelRayCastTotal, m_VoxelData, Framework::s_VoxelCurrentTotal ); m_VoxelRenderDataUsed = m_VoxelRenderData; } else { m_VoxelRenderDataUsed = m_VoxelData; m_VoxelRenderTotal = Framework::s_VoxelCurrentTotal; } }
But, what if we wanted to multi-thread? For instance, what if we wanted to split up the solving of raycasts into multiple jobs? With DOD it becomes extremely simple. We extend the SolveRayCast function to not only take a count of data it needs to chew, but also an offset into the array. Because we don’t read and write to the same array, we can split it up without race conditions.
A crazy cool benefit, wouldn’t you say? :)
Results
I made this example with an idea: I’m going to take an algorithm that abuses the CPU’s cache and apply data-oriented design to it to make it faster. The results are… disappointing. Here is the graph for just dumping voxels on screen:
Mind the gap between 5000 and 10000 voxels. As you can see, the DOD version is consistently faster than the OOP version. Even at 90,000 voxels it is 1 fps faster.
But then we get the graph for the version with raycasting:
What happened? The OOP version is consistently faster than the DOD version! To be honest, I haven’t a clue. The DOD version is aligned better in memory and with improved caching comes better performance. But I don’t get any.
Conclusion
I really like data-oriented design. Forcing you to think in terms of data instead of objects means you’re taking multithreading into account from the ground up. It can, however, be extremely tricky to think in terms of data. But I’m positive that will change with experience.
The reason I wrote this tutorial was because there wasn’t any proper literature on the subject. I hope my endeavors have saved you some time. :)