Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Gamasutra: The Art & Business of Making Gamesspacer
Surviving Audio Localization
View All     RSS
May 28, 2020
arrowPress Releases
May 28, 2020
Games Press
View All     RSS

If you enjoy reading this site, you might also want to check out these UBM Tech sites:


Surviving Audio Localization

February 14, 2007 Article Start Page 1 of 6 Next

We didn't need dialogue. We had faces!” claims Norma Desmond (Gloria Swanson) in Billy Wilder’s Sunset Blvd. (1950). Except that now we do, and we have a lot in video games.

You snipe a guard in a murky alley, there goes the AI triggered cue; you’ve just started a brand new adventure game, there goes the pre-rendered exposition scene; you’ve defeated a level boss, there goes the scripted event hinting at what to do next. In terms of audio, a big game in the past couple of years was over 10,000 recorded lines. With next-gen platforms and publishers banking on massive contents, those figures are likely to grow by 3 if not by 5 or 10 in the medium or long run (and we know the long run isn’t very long in this industry) depending on how effectively AI will manage audiobases1 and cues’ variations.

Localizing in-game and video dialogue is a hot topic: it is complicated and expensive and it usually takes place at a time when you have a million other fish to fry (the first of which being this game you need to finish). This difficulty originates from our games’ high level of sophistication, but also from worldlier principles: the intricacy of level design, programming, animation, script writing (and many other yet to be described tasks) result in what one could describe as a real house of cards. Add pressure from publishers for last minute changes and your production calendar is offset by four weeks while a cold wind of anxiety blows over your already sleep-deprived teams.

Finally, localizing dialogue means dealing with external resources that will carry out translation, casting, recordings and linguistic testing, all tasks that also need to fit into your schedule while your submission date isn’t likely to change.

Localizing audio and video is high maintenance. To paraphrase Harry Burns (Billy Crystal) in “When Harry Met Sally”2, it’s the worst kind: it’s high maintenance that people misconstrue as low maintenance. It has become even more so in the age of “sim ship”3: more and more games, at least AAA’s that need international markets to break even and start generating revenue faster, are localized while the original game is still in its final (or not so final) stage; well, that is exactly like trying to build a house without the final blueprint.

The purpose of this article is to shed some light on original and localized version entanglement and hopefully offer some advice to prevent things from escalating to DEFCON 1.

1. Scheduling & Budgeting

1.1 Let’s first take a look at a (very) simplified audio production process:

Figure 1: audio process

  • Your design is locked, your character brief is ready and you have a “final” recording script for the “original” version (we’ll assume it’s American English). Simultaneously animation is being created, based sometimes on temporary voices. A voice director is hired and briefed on the game mechanics and “feel”.

  • U.S. casting begins, goes through in-house and/or licensor / publisher approval (if applicable).

  • U.S. recordings begin (you can do this in-house if you have the facilities or contract a studio if you don’t). Cues are delivered in raw sessions and need to be cut, cleaned (from clicks, scratches and mouth noises) and named4. Keepers5 need to be selected. Lines that will be integrated in cinematic scenes are delivered to the animation department for editing. That’s it: you have a US audiobase that is your reference for localization recording and here’s why you should definitely wait for US recordings to be over:

U.S. lines provide time references (see 1.3). Animation cannot adapt to five languages6. Even if it could, your disc space is limited. If you record “blind”, each foreign file will have a different length. You need the actual U.S. recorded lines so that their localized equivalents match.

Each language “sound databank” will also be the same size – which of course will facilitate data and memory optimization. Before local actors render their lines, they listen to the U.S.’s, then give theirs while checking the wave’s physical form on screen so they can adapt.

Figure 2: audio waves

You don’t always have the time to select keepers (a lengthy process). This will allow countries to “mirror” all alternative takes in all languages so that later you can pick the right ones (which frequently happens when animation is not final).

Original recordings provide additional artistic direction and help reduce recording time. U.S. recording times are by and large longer than localizations.

You get back mirrored localized audiobases that are consistent: same folder architecture, identical file naming, matching number of files. This comes in very handy when it’s time for post production and integration.

If U.S. recordings are done, chances are retakes7 (if deemed necessary) won’t be too numerous. Remember: all retake costs (studio, engineer, actors and voice director) are multiplied by the number of languages.

  • Localization begins: casting then script translation. Once localization agencies8 are all set (translated script and U.S. audiobase) they start recording.

  • Localized audiobases are delivered back to the developer where they are checked for integrity and quality (everything is there and meets the standards). Files are post produced the same way U.S. were. You can then move on with integration.

  • QA and debug can start. Note: if post production is significant and requires a lot of special effects on voices, you can integrate the dry files9 and proceed with linguistic testing to save some time. The dry files will be eventually swapped with the final ones (voices with effects, final videos) which of course you can’t afford not to check in actual gaming conditions.

1. Audiobase: ensemble of all lines recorded for the game.

2. He was of course referring to women.

3. Simultaneous shipping: original and localized versions ship at the same time.

4. Raw session is cut into single files that need naming, ex: CHAR1_Intro_001

5. When you have alternative takes, you then need to select the keeper (the one that meets your demands).

6. FIGS (French, Italian, German, Spanish) + US. Those are generally the 5 languages you find on PAL discs. Of course the number of languages can go up.

7. Additional recording session (lines were overlooked, quality was poor, lines were added in between etc.)

8. Local service providers that will translate cast and record the game in each language.

9. Special effects are not done yet.

Article Start Page 1 of 6 Next

Related Jobs

Fred Rogers Productions
Fred Rogers Productions — Pittsburgh, Pennsylvania, United States

Digital Producer
Visual Concepts
Visual Concepts — Agoura Hills, California, United States

Square Enix Co., Ltd.
Square Enix Co., Ltd. — Tokyo, Japan

Experienced Game Developer
XSEED Games — Torrance, California, United States

Localization Editor

Loading Comments

loader image