Gamasutra: The Art & Business of Making Gamesspacer
View All     RSS
November 1, 2014
arrowPress Releases
November 1, 2014
PR Newswire
View All
View All     Submit Event

If you enjoy reading this site, you might also want to check out these UBM Tech sites:

Threading - USE IT. How to stop wasting most available CPU power!
by James Hicks on 05/05/14 12:33:00 am   Featured Blogs

The following blog post, unless otherwise noted, was written by a member of Gamasutra’s community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.


On my machine, everything from Skyrim to Dwarf Fortress consumes what windows tells me is 11-13% of available CPU power. On a 4 core machine with hyper threading, this means in reality that somewhere around 25%, maybe a little less, of my CPU is actually used by each game, AND that everything is limited in its maximum performance - either framerate or actual game speed - by my CPU. Or rather, the 25% or so that developers are using.

Meanwhile, my GPU is almost always partially idle - its own maximum potential never reached because the games are stuck, constrained by a single core of processing power.

When this happens, my CPU is running at 4.3Ghz and is a reasonably modern Intel design. The problem is not my hardware.

It's the software. Part of the problem is game engines. They flag virtually everything as non thread safe, and nobody seems to be looking into expanding their engine to overcome this.

But the other half of the problem is games developers themselves refusing, for some reason, either to work around the limitations in their engines, or to use the other cores available to them in new and exciting ways.

I'm going to explain how we do both, in as close to plain english as I can. But first, I want to get something out of the way. A lot of folks might be tempted, at this point, to say "But James, we DO use other threads". No, you don't. If your game is still bottlenecking on one core and one core only, I don't care what tiny little jobs like sound processing or flibbit gibbling you've palmed off to another thread - you're still basically a single threaded application, sorry!

To the right of this picture is a gas giant that has a procedurally generated texture. Instead of bundling a few terabytes of textures that I can't afford artists to make into Ascent - The Space Game (, we generate textures for every planet and moon in the game - hundreds of billions of them actually - and we do it all in other threads. As your ship gets closer to a planet, the thread kicks off again and again, making higher and higher resolution textures.

But there's a trick to this; Ascent is built on the Unity engine, which doesn't let you modify a texture in another thread. The trick is not rocket surgery - Unity lets you import an array of colours into a texture, and we just work on our array of colours in another thread.

This means that some of the time, Ascent will be using two, three, even all four cores at once if, for example, we approach a big planet with several moons. And it lets us do this without impacting framerates - in fact on most hardware our main thread is mostly idle and you either sit at max FPS, or you're GPU limited. You get a small impact when we load the array of colours into the texture back in the main thread, but I can't avoid that until Unity discovers threading.

I'll give you another example...

This is a picture of our terrain engine working away on a planet surface. Actually this picture's a little old; today's version of the terrain engine is smarter but you get the idea of what's going on. The player is in the bottom centre, almost at ground level, looking 'north' and we're looking at the scene from about 50 kilometers above. As is typical for a terrain engine, a lot of high speed mathematics is going on. But what's not typical is that absolutely none of it is impacting framerates (except where you run into GPU slowness because of all those triangles) - because we do all of our thinking in another thread.

Again, Unity doesn't let you change a mesh, such as our terrain sphere above, in another thread. It all has to be done in the main thread - but once again, we split the mesh into its component data structures (vertices, triangles, normals and tangents), do a whole bunch of stuff on these structures in our other thread, and periodically bring this data back and alter the mesh with it in the main thread.

There's two huge advantages to this - firstly, instead of only being able to do a bit of terrain-mangling each frame, for fear of clogging up the main thread and slowing down framerates, we can do, well, pretty much whatever we like, as fast as we like. As a result, the two main limitations on our terrain's detail are GPU power (too many triangles and the average GPU loses its lunch), and Unity's limit on triangles per mesh of 65k or so. We could work around that second one, but there's nothing to be done for the first but wait a few years for everybody to upgrade, and/or limit the engine based on available GPU power (which we do).

The second advantage, and the real beauty of threading in my view, is that it gives us virtually limitless CPU power. We can think about a lot of triangles there, and think a LOT about each triangle, and not worry about the game's performance so long as we don't make too many of them. As a result, we have a terrain engine that can render something the size of Earth, onto a sphere, without a height map as source data, because it has the available CPU to procedurally generate its own height map on the fly. Better yet, with all that extra CPU, this terrain engine isn't just thinking about the shape of the landforms, but continually balancing between the planet's base texture and the level of detail in the terrain - sometimes the texture is more detailed than the terrain and sometimes its the other way around. The engine has the CPU grunt to recognise this and communicate it to the GPU via shaders and their inputs. This is a big part of the recipe for how we can have hundreds of billions of crazily gargantuan planets in a game that's either 18 or 31 megabytes, depending on which platform you download it for.

And then there's some less conventional examples. One I keep hearing about is AI. Developers always seem to say their AI is limited by CPU power. Last time I checked though, it was 2014! Even mobile phones with less than two cores are now rare. The way I look at AI, there are two levels of it (three for an MMO but to simplify, I'll talk about two) - "live" AI which deals with actually moving, shooting, controlling the robot/person/tank/ship/aircraft/attack chicken/you name it, and then there's the "tactical" or "thinking" AI - you know, the part nobody codes because there's no CPU power available?

Well, imagine if there were whole cores sitting idle, just waiting to start thinking strategically about the whole situation facing your AI? And imagine if those cores could be tasked to think about each situation in great depth, without affecting framerates! Well, you're not imagining anything, that's exactly the situation!

Leaving server AI out for the time being, the way Ascent manages local client AI is exactly in this way. At any given moment, an NPC ship has a current task it's attempting to do, or rather, a task mode which is usually two things at once. It will always be moving, making itself a harder target, but its movements might be 100% evasive while its weapons or shields charge up or it flees for its life, or it could be attacking.

If it's attacking, which weapon is it trying to point at you right now? If evading, which side does it most want facing away from you? What direction does it want to be flying off in? When should it start turning back to shoot at you and where should it be aiming? At what point does it decide its lost the battle and flee for its life? Should it have the gravity anchor on for fine tuned control or to kill its sideways velocity, or off for maximum acceleration? Well, these questions are all tactical - they get ansered by a thinking thread.

Every quarter second or so, the thinking thread talks to the main thread and potentially changes its current activity mode, targets, the distance it wants to be from you, the weapons its using or not using, you name it. Then it goes back to analysing the situation in any level of depth and detail we might want. What this means for the future is that we can keep loading intelligence into our local AI and never worry about framerates. This means something totally different on the server, but that's not today's discussion.

How does this impact our AI? Well, again leaving the server side out for a moment, the fact is that in order to make Ascent's space battles even remotely playable for a human being, I have to severely hamper the AI's ships. That's right, instead of our AI cheating like everyone else's, I have to give the players all the advantages. The AI can move and turn at only a fraction of the speed you can - and in the higher end battles if you blink, you lose. I tried battling the AI on equal terms myself, and despite being the actual developer I found it completely impossible to beat. Its performance was literally perfect. It never misses, and it thinks and reacts to a changing situation before my brain even sees what's going on. As a result, fighting an NPC in Ascent is a lot more like fighting a human opponent in other games. It's more immersive, and more challenging - for all the right reasons.

So, to the nitty gritty - how hard is it to do Threading? Well, in my opinion, it's really easy if you keep it as simple as possible. It's harder than just doing everything in the main thread, but it's easier than trying to optimise a complex game after you've jammed everything into the main thread, so even in terms of net difficulty I'd say it's fairly neutral. One piece of advice I would give is to think outside the strict OO box when you're threading. Think about everything as data and instructions again for a moment, and it gets a lot more intuitive - probably because that's exactly how CPU cores think about everything. You're ripping off a layer of abstraction, basically, and that's good because messed up threading can be a real nightmare, and when you mess it up, you want that nightmare to be simple  - and short.

The process I follow every time is as simple as:

  1. Set up data structures for the main thread and the spawned thread to share. Include one bool which tells everybody who is messing with the data right now. When the spawned thread has done its thing, that bool is how the main thread knows its safe to open its christmas presents.
  2. Create a thread and give it the code to use. Again, mostly functional code for the most part seems to make this easier and more intuitive, at least to me. I tend to use low priority for threads where I can, to prevent them from messing with the main thread and framerates, even if we get a lot of them going at once.
  3. Set our bool to true
  4. Start the thread
  5. Main thread yields while that bool is set against it
  6. Spawned thread does heavy lifting (yield sometimes too, 99% of the time this wont slow you down at all because there's so much unused CPU grunt available and you'll be back right away. I know there's a school of thought that you shouldn't need to yield if your priorities are set accurately, which is a lovely theory)
  7. Spawned thread finishes and sets the bool to false, thereafter completing or yielding until the bool is back (depending how you want to control it)
  8. In the very next frame, the main thread sees the bool is false, and takes its exciting newly processed data

In essence, that's it. The annoyances come during debugging - typically nice error codes don't make their way back from your spawned thread, but there are ways around that. The main thing to make sure of is never to cross the streams... one thread at a time is working on our shared data structures. There are thread-safe structures you can break this rule with if you're so inclined, but I've yet to find a need. Our AI code's main thread reads from variables the thinky thread writes to all the time, and that's about as close as I get to breaking these rules.

There are other ways to handle communications between the threads, but I've found this idiotically simple process to be the most foolproof and reliable. When threading, I like my bugs as simple as possible.

So how does this help Skyrim? I'm not sure because I have never attempted to profile what the game's using its main thread for. Presumably the terrain engine is part of it, so there's something. Presumably Skyrim's limited AI could also have been made more advanced if it had its own core (or a few).

Dwarf Fortress on the other hand could probably spawn ten threads, and spread the dwarves and other characters between them, dramatically speeding up its game framerates. Hmm now I feel like playing Dwarf Fortress, so it's probably time to wrap up.

In conclusion, I look forward to seeing my CPU go over 13% while playing someone ELSE's game in the future. May this day come soon.

Related Jobs

Twisted Pixel Games
Twisted Pixel Games — Austin, Texas, United States

Senior Graphics and Systems Engineer
Twisted Pixel Games
Twisted Pixel Games — Austin, Texas, United States

Mid-level Tools and Systems Engineer
Sega Networks Inc.
Sega Networks Inc. — Madison, Wisconsin, United States

Mobile Game Engineer
Forio — San Francisco, California, United States

Web Application Developer Team Lead


James Morgan
profile image
A set of very interesting thoughts on threading. I've long been annoyed how games ignore most of the processing power of my computer.

This has also given me some great ideas to try out with AI in a later project. I'm impressed that you made an AI so good even you couldn't beat it. This gives me hope for more interactions between AI elements of a game producing their own set of developing reactions that makes them look more "alive" and "real" to the player.

James Hicks
profile image
That's exactly the kind of fresh, wacky idea this enables!

Daniel Smith
profile image
I find it interesting that Skyrim would only be using one core, since the 360/PS3 version would almost certainly be using multiple cores. In fact, its pretty much a given that any (reasonably technically advanced) console game of the last generation will be multi-threaded to some degree or another (the hardware is just not powerful enough to allow the developers *not* to do that). I remember a talk at GDC back in... 2007(i think ?)... about how Saints Row was getting close to 100% cpu usage across the 6 hardware threads of the 360. I imagine this is the norm now. Even Unreal 3, which had a fairly 'lazy' multhitreaded architecture (at the time anyway, haven't used it in a while) used a decent portion of all the cpu. eg. The renderer, physics. audio were on a separate threads.

So, if windows versions of modern games are stuck on one core(which your profiling of skyrim suggests), i'd say that was either down to a poor/lazy port, or some issue in windows that makes it difficult to utilize multiple hard threads easily/reliably (i'm not a hardcore windows programmer, so i don't know whether you have much control over where a thread is run). Are windows developers generally just 'lazier' than their console counterparts? Knowing that they are most likely going to be running on a far faster cpu anyway, do they just run everything single threaded?

Unity is another story, and I'm generally not a fan. Their scripting api is not really set up to be threaded at all, and you're always going to be fighting for performance, and coming up with hacks to work around their architecture (and the lack of access they give you to the engine). Kudos for getting it to work well for you though!

Daniel Gutierrez
profile image
Err, don't worry you shouldn't lose full faith yet ;). Skyrim as well as most modern games use multiple threads (you can even mod the .ini file to tell it what it can and can't use multiple threads for).

Window's % usage is completely inaccurate on multi-core machines due to a variety of reasons. Also most modern games end up being GPU limited, since a lot of the things discussed in this article rely on the GPU which is much faster for most of these types of things. Part of the reason being that many were developed w/ consoles in mind, which tend to have much more powerful GPUs (relative to their CPU) than a standard integrated-graphics computer.

James's point about not wanting to limit users based on their GPU is completely valid (and I agree, Unity brings along some issues itself trying to rely on the GPU), but that also is part of the reason major game companies like developing for consoles (and mobile devs like the iphone): mostly reliable end-user specs!

Daniel Smith
profile image
oh, i wasn't losing faith in anything :) Was just trying to come up with an explanation for what i thought seemed like very weird data (only one cpu core being utilized in a game that originated on a console - i know, from a lot of experience, that that shouldn't be the case). Should have realized the answer was simply that windows was bad at reporting cpu usage :)

James Hicks
profile image
Confirmed - ran Skyrim with some .ini edits for multi-threading and it's definitely using more than one core now. Maybe 1.5 cores but that still puts it well ahead of most games released today.

Why it doesn't ship with multi-threading as a default is an interesting question... perhaps there's a stability concern?

Brian Paulus
profile image
Glad to see someone else making use of threading in Unity. I used a very similar approach for the voxel engine in my game EARL's Warehouse, except I use a work queue approach - I queue requests for 16x16x16 areas of the map, pop from the queue in my worker threads, create the model for the requested area as component vertex data, and then post them to a result queue to be turned into full mesh objects by the main thread.