Gamasutra: The Art & Business of Making Gamesspacer
Charging at the Challenges of Kinect
View All     RSS
October 24, 2014
arrowPress Releases
October 24, 2014
PR Newswire
View All

If you enjoy reading this site, you might also want to check out these UBM Tech sites:

Charging at the Challenges of Kinect

June 25, 2012 Article Start Previous Page 3 of 3

Are there any challenges in Kinect engineering that you haven't had a chance to tackle yet, or is there something that you're looking forward to tackling?

DQ: I think the big one coming up is speech. We pushed speech pretty hard in Sports 2. There was speech in the first round of launch titles; Kinectimals obviously had speech. But from day one the entire UI was gonna be speech-driven. Every game event had to have speech incorporated into it.

But it was also a very say-what-you-see approach; in golf, you change club [by saying] "four iron," kind of thing. What I'd like to see and what we're investigating now is a more natural conversation way of talking to the Kinect, so you can say, "Hey, caddy, give me a five iron," or "Hey, caddy, what should I use now?"

We're looking at that now, improving the speech system, so I think that would probably be the one that I'm personally the most interested in, mainly because I did so much work with speech in Sports 2.

Do you think that at this point with most of the visual input, whether it's 3D data or skeleton data, you've now encountered enough situations where you have a good toolbox to solve any of those problems?

DQ: Yeah, I think so. It's interesting now, as we look at new ideas, how quickly the engineers who've worked with Kinect a lot can pick out what the challenges will be. "If we do this event or this style of game, these are the things that we're going to have to deal with." That's just because we have so much experience with it now.

Our 13 sports now have been so varied -- as we said before, the gestures vary from sport to sport, so we have a good cross-section of what we've been doing and how we've solved problems in the past. As new ideas come in, we can all think, "This will be a challenge," or "Yes, we could do that pretty easily; we can copy what we did in track and field."

I think the only place that you might have new frontiers is if you go to a totally new genre like an adventure game. We recently did an article with Blitz Games for Puss in Boots, and one thing discussed was that if the developers did one-to-one tracking with the character, the character didn't look heroic on-screen anymore, because people have an exaggerated assumption of how cool they look when they're doing things -- which isn't exactly a problem you have with Sports.

DQ: Yeah. If you look at Star Wars, what they've done there is some really interesting stuff, blending in that one-to-one with extra animation into that so you use both at once. That means you get your power moment.

I've played it a couple of times, and it's interesting. When you stand there and realize that the character on screen is really puffing up their chest and getting ready for a swing, you find yourself mimicking that, and start doing it yourself, because you're getting into the thing. We call it "augmenteering" at Rare, joining in with avateering, which is that one-to-one mapping of animation. We did a little bit of augmenteering in Sports, but most of the time we were trying to get the one-to-one -- the player in the game -- as much as we could.

When it comes to speech, how much of a problem do you have with accents?

DQ: The speech system at the moment has what we call acoustic models. I'm Australian but actually I run the UK English because I think I've been in England long enough that I've lost my twang. Say we have execs come across from the States; if we leave the kits in U.S. mode, it does go down for the UK people speaking. So the acoustic models are quite tailored to the models. The UK model contains Scottish, Irish, the thick, pommy accents, whereas the U.S. mode has the Southern, and all of the American ones.

The reason those models exist and are different is that they have to include those accents for the regions. Our biggest challenge -- we have a Scottish guy at work, and he has the thickest thick accent. He actually interviewed me, and I could hardly understand what he was saying. If we know it works for him, we know it works. He's our test case, basically. "Good. It works for him." (laughs)

Whenever you do speech, they always recommend getting native speakers in front of the game, so we were sending people out to Japan and to Germany and everywhere get the native speakers talking and testing in front of the game.

Basically, what we're doing is lowering a number so it's as low as it needs to be to detect the speech, but still high enough to reject false accepts. It's just a tuning; we just dial it back and forth. We always have that infamous week at Rare where I turn it down too low and the game's just jumping around on any noise because it's just accepting everything. My name is mud for a week, and then we just turn it up again. It's really iterative, just trying to find that special spot -- and that special spot's different for each acoustic model, so the U.S. number is different from the UK number. It's just a tuning process.

Article Start Previous Page 3 of 3

Related Jobs

Magic Leap, Inc.
Magic Leap, Inc. — Wellington, New Zealand

Level Designer
DeNA Studios Canada
DeNA Studios Canada — Vancouver, British Columbia, Canada

Analytical Game Designer
University of Texas at Dallas
University of Texas at Dallas — Richardson, Texas, United States

Assistant/Associate Prof of Game Studies
Avalanche Studios
Avalanche Studios — New York, New York, United States

UI Artist/Designer


Michael DeFazio
profile image
Dear Kinect:
At some point, I'd love to have "semi-intelligent conversations" with NPCs by using my voice (as apposed to the hackneyed dialog wheel/tree where none of the proposed choices presented are options I would ever do/say).

Dynamically interacting with NPCs has been around since the old Kings Quest and Ultima Games (where you could type in keywords and have players respond to queries) and now that the technology is here can't we improve upon it ?

I'm not asking for a completely revolutionary artificial intelligent avatar system (a la Milo), I'd just like to be able to interact in a way that is less "mechanical" (static dialog trees) and more natural (in a way that resembles a "conversation")

Wouldn't it be cool to play an open world detective game (a la LA Noire) where one component of the game would be interviewing people (witnesses, suspects) to find clues using your voice, and NPC may respond to specific queries/keywords (i.e. "Where were you Friday Night?", "What do you know about Fredrick Pierce?")
...Or have a way of "bartering" with NPCs over price in an RPG? ("I'll give you $500... How about $580...$520 is my final offer")

Seems to me the voice aspect of Kinect has the most potential and the one most criminally underutilized.

Robert Green
profile image
I think that kind of thing is still many years off. Right now, the closest we have are systems like Siri, which aren't especially reliable, need to be tuned for each language/accent and, most importantly, have to send a result to the cloud for processing, which rules out most gaming scenarios. Imagine your LA Noire example with a few seconds delay while the result of each question is processed online, occasionally coming back with "can you rephrase that?" and suddenly choosing questions from a menu doesn't sound so bad.

On a different note, it's interesting that articles like this one and the one a few days ago from Harmonix are coming out just as Steel Battalion is released to terrible reviews that are calling it flat broken and a black mark on kinect itself.

Addison Siemko
profile image
I'm there with you. A man can dream...

Michael DeFazio
profile image
You might be right (i sure hope not). Google does have a developer API which dictates as you talk:
(but similar to Siri, it does require internet access). The utilization of language on Kinect I've seen seems half-baked most of the time. (I still laugh at the memory of "Lightsaber...ON!" as presented at E3.) That being said, I've been very pleased with Google's recognition accuracy (even without a grammar to choose from "options" like Kinect does)

I'm not sure if the previous Kinect voice enabled games (i.e. Mass 3) suffer from a "limited grammar" due to technical reasons (i.e. the recognition accuracy is just not up to par for anything advanced) or lack of "inspiration" (Bioware didnt want to invest much time adapting the experience for Kinect and in effect making 2 separate games).

I'd just like to see something from Microsoft moving toward the Natal/Kinect vision they sold us 3/4 years ago. (I'm not talking full fledged "Milo" here, I'm just asking for some
rudimentary non-critical character interactions). Its one of those things where if you can show us something compelling (and provide us the tools) we'd jump at the chance to offer new interesting experiences with the tech.

TC Weidner
profile image
I think kinects future may be tied to things like John Carmacks Virtual Reality.

which I think was by far the best thing at E3.

Joshua Darlington
profile image
I didnt understand why natural language speech was such a hard problem - until I started checking these linguistics lectures.

On top of context, there's a chaotic pattern of cadence, speed, tonal sweeps and etc that humans use to understand each other.

If you listen to isolated words from natural language speech it's freakin hilarious.

That said, if SIRI was a DARPA project licensed to private industry, I wonder what the military is using right not to monitor phone conversations? I wonder when they will allow private industry to license that? Did IBM's WATSON use a speech recognition system or did they fake it? I wonder if they are reaching out to the game developer community?

Colin Sanders
profile image
It's interesting to hear about the effort they put into their technology, but I can't help but worry about this path. What happens when the novelty of the Kinect wears off on audiences? Look what happened to the Wii. Once this happens, I fear they may meet the same fate as Microsoft subsidiary Ensemble. In my conversations with both former and current employees, Rare had a fantastic development culture that was all but destroyed under Microsoft's ownership. As a result, they've met with disappointment far more than they have success. Banjo-Kazooie: Nuts and Bolts (an excellent reinvention, I might add) and Grabbed by the Ghoulies are two of the most sobering examples of the studio's fall from chart-topping winner to pioneer of bargain bins across the world.

Gaming studios have been disappearing over the course of this generation: it's quite startling, and I can't help but fear that Rare is next. In an ideal world, I feel Rare should go the way of Bungie. With Scott Henson at the helm, I doubt it's possible at this point, but in my eyes, it's probably their best bet to survive.