Gamasutra: The Art & Business of Making Gamesspacer
Notes from the Mix: Prototype 2
View All     RSS
October 21, 2014
arrowPress Releases
October 21, 2014
PR Newswire
View All

If you enjoy reading this site, you might also want to check out these UBM Tech sites:

Notes from the Mix: Prototype 2

June 20, 2012 Article Start Page 1 of 4 Next

[The audio director on open world game Prototype 2 shares a crucial revelation about how to create a consistent soundscape for his game across all different sections -- and explains in depth how he achieved that great mix.]

Mixing any genre of game is a deeply logistical, technical, and artistic undertaking, and figuring out an overall approach that is right for your specific game is a challenging endeavor. I have been responsible for mixing on five titles here at Radical: Scarface: The World Is Yours, Prototype, 50 Cent: Blood on the Sand, Crash: Mind over Mutant and Prototype 2, and while we used similar technology on each title, each had unique requirements of the mix to support the game.

Sometimes this was emphasizing the "point of view" of a character, sometimes an emphasis on destruction, and sometimes on narrative and storytelling. As our tools and thinking become more seasoned (both here at Radical internally, and in the wider development community within both third party and proprietary tools), it is becoming apparent that, like any craft, there is an overall approach to the mix that develops the more you practice it.

With Prototype 2, we were essentially mixing an open-world title based around a character who can wield insane amounts of devastation on an environment crammed with people and objects at the push of a button.

The presentation also focused a lot more on delivering narrative than the first Prototype game, so from a mix perspective it was pulling in two different directions; one of great freedom (in terms of locomotion, play style, destruction), and one of delivering linear narrative in a way that remained intelligible to the player.

While this was a final artistic element that needed to be pushed and pulled during the final mix itself, there were other fundamental mix related production issues that needed to fall into place so we could have the time and space to focus on delivering that final mix.

This brings me onto one of the main challenges for a mix, and something that can be said to summarize the issues with mixing any kind of project; the notion of a consistent sound treatment from the start of the game to the end, across all the differing presentation and playback methods used in development today. Though it may seem obvious to say consistency is at the heart of mixing any content -- be it music, games or cinema -- it is something that it is very easy to lose sight of during pre-production and production. Still, it ended up being the guiding principle behind much of our content throughout the entirety of our development, and this culminated in the delivery at the final mix.

Knowing that all content will eventually end up being reviewed and tuned in the mix stage is something that forces you to be as ready as you can be for that final critical process of decision making and commitment. It comes down to this: You want to leave as many of the smaller issues out of the equation at the very end, and focus on the big picture. This big-picture-viewpoint certainly trickles down to tweaking smaller detailed components, and generating sub-tasks, but the focus of a final mix really shouldn't be fixing bugs.

This mindset fundamentally changes the way you think about all of that content during its lifespan from pre-production through production, and usually, as you follow each thread it becomes about steering each component of the game's soundtrack through the lens of "consistency" towards that final mix. Overseeing all these components with the notion that they must all co-exist harmoniously is also a critical part of the overall mix approach.

Consistency is certainly a central problem across all aspects of game audio development, whether it is voiceover, sound effects, score, or licensed music, but it comes to a head when you reach the final mix and suddenly, these previously separated components, arrive together for the first time, often in an uncomfortable jumble.

You may realize that music, FX, and VO all choose the same moment to go big, cutscenes are mixed with a different philosophy to that of the gameplay, or that the different studios used to record the actors for different sections of the game is now really becoming a jarring quality issue because of their context in the game. The smaller the amount of work you have to do to nudge these components into line with the vision of the overall mix, the better for those vital final hours or weeks on a mix project.

Horizontal and Vertical Mixing

For the first time, in writing this postmortem, I have realized that there are two distinct approaches for mixing that need to come together and be coordinated well. There are several categories of assets that are going to end up "coming together" at the final mix. These usually form four major pre-mix groups:

  • Dialogue
  • Music
  • In-Game Sound Effects
  • Cutscenes

Here is the realization bit: Every piece of content in a video game soundtrack has two contexts, the horizontal and the vertical.

Horizontal context is what I think of as the separate pre-mix elements listed above that need to be consistent with themselves prior to reaching a final mix. Voiceover is a good example, where all speech is normalized to -23 RMS (a consistent overall "loudness"). This, however, only gets you so far, and does not mean the game's dialogue is "mixed".

Vertical context is the context of a particular moment, mission, or particular cutscene which may have different requirements that override the horizontal context. It is the mix decision that considers the context of the dialogue against that of the music and the sound effects that may be occurring at the same time. There can be many horizontal mixes, depending on how many different threads you wish to deem important, but thankfully, there is only one vertical mix, in which every sound is considered in the context(s) in which it plays.

There is an optimal time to approach the mix of each of these at different points in the production cycle.

  • Horizontal consistency is the goal of pre-production, production and pre-mixing (or "mastering").
  • Vertical consistency is the goal of the final mix, where missions and individual events should be tweaked and heard differently depending on what else is going on in those specific scenarios.

I will discuss each of these in terms of their lifespan through production, and trajectory towards the final mix, and then summarize by talking a little about how the final mix process itself was managed.

Article Start Page 1 of 4 Next

Related Jobs

The Odd Gentlemen
The Odd Gentlemen — Los Angeles, California, United States

Sound Designer
Digital Extremes
Digital Extremes — London, Ontario, Canada

Sound Designer
Crystal Dynamics
Crystal Dynamics — Redwood City, California, United States

Audio Lead


Peter Hasselstrom
profile image
I'm glad you decided to go with the most dynamic "Home Theater" setting by default. Dynamic Range controls have started to show up in many games and there is no consistency across titles to make it easy for player to understand what they do. The default is what most players will use so defaulting to a lowest common denominator setting is something I personally don't like.

Jesse Lyon
profile image
Hi Rob, thanks for the article, its interesting to hear how other studios approach mixing. One question I have, you mention dropping the speech in directly from the recording session and then mixing the game around those natural dynamics, but you later go on to say all the dialogue was mastered at -23RMS. Do you mean that it was compressed/limited to -23 at the record stage (mastered on the fly)? Or was there some speech mastering done afterwards?

Jesse Lyon
Capcom Vancouver

Rob Bridgett
profile image
Hi Jesse,
When the dialogue was recorded, everything that came out of the pre's (Manley Slam) was, generally speaking, around the -23 range. Average -23 , some of the yelling stuff around -18 to -11. They all mix very well together dynamically though, so yes, 'mastered' at the recording stage.
Later on, when we measure the output from the console, we usually have a master dialogue bus that we can solo and check on the overall output levels from the console itself (non-positional mission dialogue), we can simply move that 'master dialogue' bus fader to get us to that average of -23.
Hope that makes sense...

Roger Haagensen
profile image
"felt like it needed to be pushed a little louder to match competitive games in our genre" ouch, ever hear of the loudness war?

I assume you meant that it felt like Prototype 2 needed less headroom or dynamics.

After all, if I can simply change the volume knob and turn the volume of your game down (because it's too "loud") and the volume of others up (because they are too "quiet") then your "volume" adjustments are wasted.

Also, you did not mention if -23dB RMS is meant to mean -23dBFS, and if it is sinusoidal or not.

The movie standard is -20dBFS RMS (A-weighted) sinusoidal. And assuming your -19 value ("we eventually landed somewhere in the region of -19") was from dolby then you are pretty darn close to that standard which is commendable. -20dBFS is also exactly the same as 0.1 in floating point. So it's very easy for a programmer to remember. "If (rms>0.1) then toohot==true" etc.

I myself ensure that all audiowork is -20dBFS RMS (Z-weighted/flat/no loudness curve) sinusoidal, 50ms window, channels summed (per window) as having equal power.
This is the best way to measure fully objectively. Any weighting/loudness curves always tend to miss outliers. EBU R128 and ReplayGain and A-weighting and other methods always manage to let some slip past, usually causing bass heavy material to be amplified even more.

One of the reasoning behind weighting or loudness curves is that it gives more "weight" to the common human hearing range/frequencies.
At least to myself this is not an issue as non-human frequencies are unwanted anyway as they "steal" bandwidth from the music I make.

The more dynamics a work has, the more likely it is that the consumer/user/player will actually turn the volume up instead of down.

But when is too much dynamics too much? This again is the territory of subjectivity.
There is no way to objectively measure loudness, and there never will be as nobody has the same ears, and the average of human hearing ranges ore preferences is not adequate either (loudness curves).

Unweighted RMS (sinusoidal) is as objective as one can get mathematically speaking, even if better loudness curves appear in the future, RMS will always remain RMS regardless, in that sense it's future proof.

But what is too much dynamics? Unfortunately I can not recall the link/site. But going much beyond 40dB is not much point. Humans prefer things not too loud and not too quiet.

A good rule of thumb until you stumble upon that preferred human dynamic range info I did,
is to look at it this way. If the RMS target is -20dBFS RMS (Z) Sinusoidal, then the dynamic range (where the intended audio content lives as opposed to where there sound is) should be from 0dBFS to -40dBFS.

This effectively gives you 20dB headroom for loud stuff and 20dB um.. legroom (?) for quiet stuff.
Most audio details below -40dBFS gets drowned out by room noise. (a CPU cooling fan is often in the -30 to -40 dB loudness range, RMS A weighted)

The movie standard is 83dB SPL (BTW! This is the same as -20dBFS RMS-A).
But few home theatres setups/rooms have a noisefloor of 0dB SPL.
If you look at the examples at
you will see that 83dBSPL is damn loud, and even half that (40dBSPL) is very loud.

If room noise 20dB SPL, and the dynamic range of the audio is 40dB, then the consumer would need to set the volume such that the loudest part (0dBFS RMS) would be 60dBSPL to ensure they could hear the most quiet part (-40dBFS RMS in this case).

It is amusing to note that a TV is between a washing machine and a hands mixer, both usually annoyingly loud IMO, but it's all in the context though.

Also note that EPA max SPL is 70dB. And if you take 70dBSPL subtract 30dBSPL noisefloor. you get 40dBSPL.
And as mentioned above. -20dBFS RMS(Z/sine) with a dynamic range of 0dBFS to -40dBFS can be reproduced "comfortably" in a room with a typical noisefloor.

Audiophile rooms and studio rooms and closed headphones are the exception obviously.

PS! Keep in mind the different types of headphones. Semi-open or half-open (circumaural with half open back) headphones are not the same as closed/circumaural, open headphones can probably fall in the same category as a quiet room. While half open probably the same as a very quiet room. But both are susceptible to room noise. Only closed headphones approaches studio room sound. But closed headphones suffer from high pressure during bass, causing earlier listening fatigue.

Yeah. Sound ain't easy. And if it was easy, it wouldn't be fun anymore either. *laughs*