Applying Data-Oriented Design

It's pretty easy to apply these ideas to a self-contained system that already works over mostly homogeneous data. Most particle systems in games are probably designed that way because one of their main goals is to be very efficient and handle a large number of particles at high frame rates. Sound processing is another system that is naturally implemented by thinking about data first and foremost.

So, what's stopping us from applying it to all the performance-sensitive systems in a game code base? Mostly just the way we think about the code. We need to be ready to really look at the data and be willing to split up the code into different phases. Let's take a high-level example and see how the code would have to be restructured when optimizing for data access.

Listing 1 shows pseudo code for what could be a typical update function for a generic game AI. To make things worse, that function might even be virtual and different types of entities might implement the code in different ways. Let's ignore that for now and concentrate on what it does. In particular, the pseudo code highlights that, as part of the entity update, it does many conditional ray casting queries, and also updates a state based on the results of those queries. In other words, we're confronted with the typical tree-traversal code structure so common in Object-Oriented Programming.

Ray casts against the world are a very common operation for game entities. That's how they "see" what's around them, and that's what allows them to react correctly to their surroundings. Unfortunately, ray casting is a heavyweight operation, and it involves potentially accessing many different areas in memory: a spatial data structure, other entity representations, polygons in a collision mesh, etc.

Additionally, the entity update function would be very hard to parallelize on multiple cores. It's unclear how much data is read or written in that function, and some of that data (like the world data structure) might be particularly hard and expensive to protect from updates coming from multiple threads.

If we reorganize things a bit, we can significantly improve performance and parallelization.

Break Up And Batch

Without looking at any of the details inside the entity update, we can see the ray casts sticking out like a sore thumb in the middle. A ray cast operation is fairly independent of anything else related to the entity; it's heavyweight, and there could be many of them, so it's a perfect candidate to break up into a separate step.

Listing 2 shows what the broken up entity update code would look like. The update is now split in two different passes. The first pass does some of the updating that can be accomplished independently of any ray casts and decides which, if any, ray casts need to be performed sometime during this frame.

The game code in charge of updating the game processes all AI entities in batches (Listing 3). So instead of calling InitialUpdate(), solve ray casts, and FinalUpdate() for each entity, it iterates over all the AI entities calling InitialUpdate() and adds all the ray cast query requests to the output data. Once it has collected all the ray cast queries, it can process them all at once and store their results. Finally, it does one last pass and calls FinalUpdate() with the ray cast results on each entity.

By removing the ray cast calls from within the entity update function, we've shortened the call tree significantly. The functions are more selfcontained, easier to understand, and probably much more efficient because of better cache utilization. You can also see how it would be a lot easier to parallelize things now by sending all ray casts to one core while another core is busy updating something unrelated (or maybe by spreading all ray casts across multiple cores, depending on your level of granularity).

Note that after calling InitialUpdate() on all entities, we could do some processing on other game objects that might also need ray cast queries and collect them all. That way, we can batch all the ray casts and compute them at once. For years we've been drilled by graphics hardware manufacturers about how we should batch our render calls and avoid drawing individual polygons. This works the same way: by batching all ray casts in a single call, we have the potential to achieve a much higher performance.

Splitting Things Up

Have we really gained much by reorganizing the code this way? We're doing two full passes over the AI entities, so wouldn't that be worse from a memory point of view? Ultimately you need to measure it and compare the two. On modern hardware platforms, I would expect performance to be better because, even though we're traversing through the entities twice, we're using the code cache more efficiently and accessing the entities sequentially (which allows us to pre-fetch the next one too).

If this is the only change we make to the entity update, and the rest of the code is the usual deep tree-traversal code, we might not have gained much because we're still blowing the cache limits with every update. We might need to apply the same design principles to the rest of the update function to start seeing performance improvements. But at the very least, even with this small change, we have made it easier to parallelize.

One thing we've gained is the ability to modify our data to fit the way we're using it, and that's the key to big performance boosts. For example, after seeing how the entity is updated in two separate passes, you might notice that only some of the data that was stored in the entity object is touched on the first update, while the second pass accesses more specific data.

At that point we can split up the entity class into two different sets of data. Then the most difficult task is to name these sets of data in some meaningful way. They're not representing real objects or real-world concepts anymore, but different aspects of a concept, broken down purely by how the data is processed. So what before was an AIEntity has now become an EntityInfo (containing things like position, orientation, and some highlevel data) and an AIState (with the current goals, orders, paths to follow, enemies targeted, and so forth).

The overall update function now deals with EntityInfo structures in the first pass and AIState structures in the second pass, making it much more cache friendly and efficient.

Realistically, both the first and second passes will have to access some common data such as the entity's current state: fleeing, engaged, exploring, idle, etc. If it's only a small amount of data, the best solution might be to simply duplicate that data on both structures, which goes against all "common wisdom" in Computer Science. If the common data is larger or is read-write, it might make more sense to give it a separate data structure of its own.

At this point, a different kind of complexity is introduced—keeping track of all the relationships from the different structures. This can be a particular challenge while debugging because some of the data that belongs to the same logical entity isn't stored in the same structure, making it harder to explore in a debugger. Even so, making good use of indices and handles helps this problem become much more manageable (see "Managing Data Relationships" in the September 2008 issue of Game Developer).

Conditional Execution

So far things are pretty simple because we're assuming that every AI entity needs both updates and some ray casts. That's not very realistic because entities are probably very bursty—sometimes they need a lot of ray casts, and sometimes they're idle or following orders and don't need any for a while. We can deal with this situation by adding a conditional execution to the second update function.

The easiest way to conditionally execute the update would be to add an extra output parameter to the FirstUpdate() function that indicates whether the entity needs a second update or not. The same information could then be derived from the calling code depending on whether there were any ray cast queries added. Then, in the second pass, we only update those entities that appear in the list of entities requiring a second update.

The biggest drawback of this approach is that the second update went from traversing memory linearly to skipping over entities, potentially affecting cache performance. So what we thought was going to be a performance optimization ended up making things slower. Unless we're gaining a significant performance improvement, it's often better to simply do the work for all entities whether they need it or not. However, if on average less than 10 or 20 percent of the entities need a ray cast, then it might be worth avoiding doing the second update on all the other entities and paying the conditional execution penalty.

If the number of entities to be updated in the second pass were fairly small, another approach would be to copy all necessary data from the first pass into a new temporary buffer. The second pass could then process that data sequentially without any performance penalties, which would completely offset the performance hit that comes from copying the data.

Another alternative, especially if the conditional execution remains fairly similar from frame to frame, is to relocate entities that need ray casting together. That way, the copying is minimal (swapping an entity to a new location in the array whenever it needs a ray cast), and we still get the benefit of the sequential second update. For this to work, all your entities need to be fully relocatable, which means working with handles or some other indirection, or updating all the references to the entities that swapped places.

Different Modes

What if the entity can be in several totally different modes of execution? Even if it's the same type of entity, traversing through the modes linearly by calling the update function could end up using completely different code for each of them, resulting in poor code cache performance.

There are several approaches we can take in a situation like this:

If the different execution modes are also tied to different parts of the entity data, we can treat them as if they were completely different entities and break each of their data components apart. That way, we can iterate through each type separately and get all the performance benefits.
If the data is mostly the same, and it's just the code that changes, we can keep all the entities in the same memory block but rearrange them so entities in the same mode are next to each other. Again, if you can relocate your data, this is very easy and efficient (it only requires swapping a few entities whenever the state changes).
Leave it alone! Ultimately, Data-Oriented Design is about thinking about the data and how it affects your program. It doesn't mean you always have to optimize every aspect of it, especially if the gains aren't significant enough to warrant the added complexity.

The Future

We might wonder if thinking about a program in terms of data and doing these kinds of optimizations is a good use of our time. Is this all going to go away in the near future as hardware improves? As far as we can tell right now, the answer is a definite no. Efficient memory access with a single CPU is a very complicated problem, and matters get much worse as we add more cores. Also, the amount of transistors in CPUs (a rough measure of power) continues to increase much faster than memory access time. That tells us that, barring new technological breakthroughs, we're going to be dealing with this problem for a long time. We'll have to face it right now and build our technology around it.

There are some things I'd like to see in the future to make Data-Oriented Design easier. We can all dream of a new language that will magically allow for great memory access and easy parallelization, but replacing C/C++ and all existing libraries is always going to be a really hard sell.

Historically, the best advances in game technology have been incremental, not throwing away existing languages, tools, and libraries (that's why we're still stuck with C++ today). Here are two things that could be done right now and would work with our existing codebases. I know a lot of developers are working on similar systems in their projects, but it would be great to have a common implementation released publicly so we can all build on top of them.

Language. Even though a functional language might be ideal, either created from scratch or reusing an existing one, we could temporarily extend C to fit our needs. I would like to see a set of C extensions where functions have clearly defined inputs and outputs, and code inside a function is not allowed to access any global state or call any code outside that function (other than local helper functions defined in the same scope—see Listing 4). This could be done as a preprocessor or a modified C compiler, in order to remain compatible with existing libraries and code.

Dependencies between functions would be expressed by tying the outputs of some functions to the input of other functions. This could be done in code or through the use of GUI tools that help developers manage data relationships visually. That way, we can construct a dependency diagram of all the functions involved in every frame.

Scheduler. Once we have the dependencies for every function, we can create a directed acyclic graph (DAG) from it, which would give us a global view of how data is processed in every frame. At that point, instead of running functions manually, we can leave that job in the hands of a scheduler.

The scheduler has full information about all the functions as well as the number of available cores (and information from the previous frame execution if we want to use that as well). It can determine the critical path through the DAG and optimize the scheduling of the tasks so the critical path is always being worked on. If temporary memory buffers are a limitation for our platform, the scheduler can take that into account and trade some performance time for a reduced memory footprint.

Just like the language, the scheduler would be a very generic component, and could be made public. Developers could use it as a starting point, build on top of it, and add their own rules for their specific games and platforms.

Data Now

Even if we're not yet ready to create those reusable components, every developer involved in creating high-performance games should be thinking about data in their games right now. Data is only going to become more important in the future as the next generation of consoles and computers rolls in.

About the Author(s)

Noel Llopis

Blogger

Noel Llopis (@noel_llopis) thinks that pants are clearly overrated.

See more from Noel Llopis

Related Topics

Related Topics

Recent in More

Related Topics

Related Topics

Sponsored Feature: Data-Oriented Design - Now And In The Future

Applying Data-Oriented Design

Break Up And Batch

Splitting Things Up

Conditional Execution

Different Modes

The Future

Data Now

About the Author(s)

Latest News

Trending

Featured Blogs

Game Developer Essentials