Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Gamasutra: The Art & Business of Making Gamesspacer
Surviving Audio Localization
View All     RSS
January 24, 2020
arrowPress Releases
January 24, 2020
Games Press
View All     RSS

If you enjoy reading this site, you might also want to check out these UBM Tech sites:


Surviving Audio Localization

February 14, 2007 Article Start Previous Page 2 of 6 Next

1.2 Figures and audio categories

Localizing audio is expensive. It happens to be the lion’s share of your localization budget, nearing 50 to 70% (depending on the size and number of platforms), whereas translation is around 10 to 15%.

Linguistic QA is not easily quantifiable: it depends on the game size and sophistication and on the number of platforms (but let’s settle for an estimated 10% of the main platform localization budget). The rest of the localization budget is allotted to in-house resources (project management, audio and video integration and debugging, additional programming etc.).

Note: Of course, you translate, record and do post-production once (unless you have platform-specific lines) while you need to test all platforms.

Let’s take a look at factors that make localizing audio pricey. Recording sessions involve specialized people – actors, voice directors, sound engineers, recording managers etc. – and require very sophisticated facilities and equipment. In addition, those people and facilities (studios) are usually on the other side of the world and you cannot bypass labour legislations or trade unions if you want to work with professionals.

You absolutely need native and skilled dubbing actors, a voice director that understands video game mechanics as well as a support / tech team that will come up with a solid schedule and of course the best quality for the best price. Cheap acting and poor quality recording will reflect badly on your game no matter how much special effects and music you throw on dialogue. There is really a threshold under which you cannot go (your sound team and publisher should be able to give you standards).

Agencies need to be local: they have bigger talent pools and a good grip on the market. You don’t want your German actors to sound like they’ve been living in Orange County for the past 25 years. Last but not least, localization recording is expensive because you ask vendors to work very fast. U.S. recordings usually take place three to four months before first submission (if all goes well). Depending on how long and late they are, on potential retake sessions, on post-production workload and number of audio bugs, you need localization recordings to be wrapped up very fast. U.S. recordings are commonly three times longer than localizations.

To draft a schedule and a budget you need to provide your vendors with figures so they can send you documented quotes. Each audio category has specificities: lip synch recording is more expensive and requires a specific setup; recording AI cues is very tiring for actors; lines with a strict time constraint will take longer to record. Here’s a quick breakdown of essential information to help your vendors assess needs and costs.

Please note that cost variations depend on countries’ labour costs and fixed union fees.

- Total Number of Lines and Total Words to Record (all categories)

This information helps to create a general timeframe. Averages are never very reliable but here are some (rough) references: recording 2,500 files (15,000 words) is a six-day job for a professional studio including two days for preparation and formatting before the loc audiobase is sent back to you.

Daily studio costs (facilities and staff) vary from 450 € to 900 € ($585 to $1,170) for recording while harmonization (cut & clean, naming) ranges from to 250 to 600 € per day ($325 to $780). Some studios offer per file rates, but it’s rarely to your advantage (that usually means they are using a subcontractor on their end).

- Breakdown of Lines and Words Per Category

You need more specific figures because they affect the schedule and the budget.

AI cues (AI triggered as opposed to scripted or pre-rendered dialogues) include:

Barks” or “organics” - These are non verbal lines, onomatopoeias if you will. They are screams, surprise, pain reactions etc. They do not require translation per se (in rare cases some will need adaptation because they’re not applicable for all languages – particularly for Asian languages).

In a perfect world, you wouldn’t need to record them (i.e. you would use the U.S. in all localized versions) but annoying voice discrepancy issues happen. If there are a lot and you are behind schedule and running out of money you can skip recording these: sound designers are eager to offer players diversity but it’s not uncommon to use only three out of the ten recorded variations. So if you cut, you want to cut where there’s meat.

Note: They are also extremely tedious and tiring to record. Actors can’t record barks for more than one or two hours in a row without putting their voice at risk (some actors will ease up if they're on stage or have extra recording work scheduled for the same day).

One liners- Short sentence lines pertaining to various reactions from non playing characters and playing characters in predetermined AI conditions. They can be taunts used in boss fights, warnings from your SWAT commander, etc. They are extensively used in multiplayer games. They need translation and recording in all languages.

Intruder! I repeat intruder in HQ!

Send back-up!

Friendly fire!

Fire in the hole!

Is this the best you can do?

Note: AI cues are almost never subtitled: they usually play while gamers are focusing on action rather then reading, plus lines can play simultaneously and create subtitle overlapping.

Ambient Dialogue

This type of dialogue brings lively background to environments: two mercenaries discuss how low their wage is or whine about their superior, merchants and customers bargain in a market, etc. Usually they are location triggered: the player reaches a zone and the dialogue starts – the closer the player gets to the scene the louder it is. This isn’t generally subtitled.

Scripted Events

Scripted events are in-game scenes where the camera and point of view are usually blocked or semi blocked so that players are forced to watch the whole scene as it is essential to story development. The scene and dialogues are programmed to play under certain conditions. If your game is subtitled this type of dialogue (also called "critical path”) is your number one priority.

Forms: it can be a single file (dialogue has been edited on one track) and if so, you want to make sure it has been edited correctly in all languages. That is why you want each individual file to be named the same so that your editor who doesn’t understand Japanese or Italian can edit dialogue efficiently and with a low risk factor. Another solution consists of using different tracks playing one after the other; in that case, errors are easier to fix (you don’t have to edit the whole dialogue again) but also more likely to happen, i.e. a file doesn’t play, lags or plays in the wrong language, etc..

High Resolution Cinematics (AKA cut scenes)

The Rolls Royce of audio / video production: they generally intro and outro levels, and you want the quality to be top notch. Scenes feature animated close-ups and sometimes require lip synch. And if it does, you need the image to record, meaning the animation has to be locked (you can record with temporary textures and visual special effects).

The main problem is you get to that stage pretty late, which forces you more often than you’d want to hold a separate recording session for cinematics. It's a bit more expensive – you need to book studios, voice directors and actors again, and a bit stressing – your post production team will have less time to finish the final mix – but it’s a good bet.

Producing cinematics is very expensive: they are usually short and don’t have a lot of dialogue, so hopefully if you need to hold a late recording session it won’t take long. Once scenes are final, it’s almost a “can’t go wrong” scenario aside from format and subtitle issues.

Figure 3: Breakdown for a medium sized FPS

- Number of Characters to Cast

Depending on what you want, you either ask your vendors for already recorded samples (the price of which is usually included in the project management fee) or for a live casting.

Live casting is expensive because you use actors and voice director in studio facilities. They come and give audition lines. Casting costs vary greatly depending on actors’ minimum fees. A 10 parts casting can reach 3,000 € ($ 3,900) in countries where labor is expensive. But it’s very practical when you want to ensure chemistry between actors, and comes in handy when you need to speed up your licensor or publisher’s approval if they lack imagination.

- Number of Characters

Unless you’ve robbed a bank or have Steven Spielberg’s budget, it’s unrealistic to hire one actor per role. Studios will cast one actor for several roles (usually a 1 to 4 ratio).

Legal fees and labor legislation are different from one country to another, but unless your only dialogue is a single voice over, actors’ salaries are always the bulk of your localization audio budget (around 30%).

Article Start Previous Page 2 of 6 Next

Related Jobs

Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States

Principal Writer
Sony PlayStation
Sony PlayStation — San Mateo, California, United States

Digital Analytics and Monetization Manager (Esports)
Square Enix Co., Ltd.
Square Enix Co., Ltd. — Tokyo, Japan

Experienced Game Developer
Yacht Club Games
Yacht Club Games — Los Angeles, California, United States

Chief Operating Officer / Producer

Loading Comments

loader image