[Continuing our series of interviews with intriguing or offbeat developers, here's a talk with Montreal developer North Side, which is trying to make natural language-parsing AI the core part of its upcoming game, Bot Colony.]
There’s never a lack of news about the way games look, sound, and play, but it’s rare to hear of a developer focusing on a game’s ability to talk with you.
That’s exactly what Montreal-based developer North Side has been doing for the last five years, and the fruits of those labors are apparent in Bot Colony, a sandbox-style adventure game set on a tropical island full of robots, all of them capable of conversing with players on a ground-breaking level.
There’s a mystery and a tale of industrial espionage to be unraveled on Bot Colony’s island stage, but the game’s true attractions are the many different robots who guide the player. In contrast to the typical adventure game’s lump of pre-recorded dialogue, the Bot Colony automatons get their voices from a complex AI program that actively parses the player’s questions, whether those questions are spoken or text-based.
North Side plans to roll out a prototype version of Bot Colony at the Game Developers’ Conference in March, and we caught up with North Side owner Eugene Joseph to hear about just how Bot Colony might change the way we understand games and the way they understand us:
How did Bot Colony first take shape? What made you want to develop a game with such a focus on conversations and voice recognition?
Eugene Joseph: I wanted to develop a game that weaves certain elements to tell a good story: a link to history (the roots are in WWII), contemporary issues (trade wars, exploitation of workers in SEA), robot cognition, planetary exploration, and of course, espionage.
Eventually, I reached the conclusion that the best way to put together the game and get a feel for it and for its atmosphere would be to write a book. I’m still refining some details in Bot Colony (the book), but most of it is already written.
What sort of mystery is presented to the player on the island of robots?
Initially, the player is recruited by Nakagawa Corp. to help it locate some advanced sensors that have disappeared. When the player gets to the island, the mission is changed: the highest priority becomes finding an industrial spy who’s infiltrated Bot Colony. As the mission advances, a robot disappears and the plot thickens.
What was it like developing the natural language-parsing AI for the game over the last 5 years? What was the hardest part of refining it?
Precise parsing and disambiguation are huge hurdles when you do NLP. Reasoning in real-time is another big problem. Finally, dialogue management and understanding the intent of the player (speaker) is a monumental challenge.
When it comes to the player's input, how complex a sentence can the game's robots understand?
We'd have to define complexity first. A first-cut definition could be the number of different words, their frequency in a corpus, but especially the way that they combine.
When we say Bot Colony will feature unrestricted natural language understanding, we mean it. There are no limits on the words that you can use, and as long as your English is correct, we should be able to deal with it (but that's already asking a lot!). However, remember there will always be pathological sentences. The classical example in NLP papers is "She saw the man with the telescope."
While this is not a complex sentence, the sentence is a good example of what's called the PP-attachment problem. PP-attachment means prepositional phrase attachment. In principle, "with the telescope" can attach either to the verb "see," meaning she used a telescope to see the man, or to "the man," meaning the man was carrying a telescope. The fact is that both interpretations are possible, and there's not very much someone can do about it. There are other cases of PP-attachment where it is possible to break the tie.
You can also build some very complex sentences, with many subordinate clauses, in the style "She was chasing the man who was chasing the dog who was chasing the cat who was chasing the mouse who was chasing the cheese" and so on. But I think that we can deal with them. We can also deal with idiomatic English. So no, there shouldn't be many restrictions from that point of view.
The tough problem is acquiring World Knowledge. Computers need a lot of common sense knowledge to understand language. We take for granted the following: If you lose an item, you can't use it; When you arrive at a place, you're there; When you leave a place, you're no longer there; If you never learnt a language, you can't speak it (or if you never learnt how to play an instrument, you can't play it); You can't run if you're dead; Water freezes; Ice cream is softer (whatever that means) than stone; The sun rises every day; If you make a hole through the wall, you can see through it; and a million (literally) other things of this nature. Basically, a machine knows nothing.
To summarize, sentence complexity (ie, the lexicon used, and the syntax) can be challenging, but it can be dealt with, at various levels of precision. I don't want to trivialize this, since building good parsers is a huge problem.
However, acquiring all the world knowledge required to understand a sentence is the bigger problem. We humans come to this world with perception (we see, hear, touch, smell, taste), but a computer “comes to the world” with nothing. Only a big empty memory and an ability to process information fast. How can a computer learn the things we learn through our senses?
To follow up an earlier question, how did you handle the real-time reasoning that the robots use to understand the player? Is it simply a matter of programming the game to recognize and distinguish every possible use and combination of words?
No. That would be impossible. There are some 200,000 meanings in English, when you consider single words and phrases. Sentences can have practically unlimited length.
While not all the permutations have a valid syntax, there can be a lot of them. Imagine sorting all the sentences of all the publications EVER published (all the books ever published, all the newspapers ever published, magazines, movie and radio show dialogue/monologue, recorded conversations) by length. This would be a small subset of what could exist. You try to do the math.
Can you give a brief example of what a typical conversation with a Bot Colony robot might entail? For instance, how would one of the robots parse a question like "Can you take me someplace where I can get a taxi so I can go to a hotel and eat?"
Sure. The robot will hand you a device that looks like an iPhone and say: “Here is a PDA. On the PDA, there is a map of the island. You click the place where you want to go on the map. A hovercraft will show up. You board the hovercraft. The hovercraft will fly to the place that bears a name identical to the one that was displayed on the map [a long way of saying 'on which you have clicked']. The hovercraft will land in that place. The canopy will open. You get off the hovercraft and you are in that place."
In particular, you can click on Old Nakagawa, where there is a restaurant.
How would you compare Bot Colony's underlying gameplay to an older, dialogue-driven adventure game such as The Last Express?
I’m still trying to get The Last Express on eBay, to play it and understand how they handled dialogue. I would guess it is some form of dialogue trees, where the player has to make a choice. I don’t believe they had natural language understanding.
Natural language understanding coupled with truly interactive reasoning and generation has not yet been deployed in videogames in a significant way. In this respect, Bot Colony will define an important technological milestone. However, natural language interfaces (without reasoning and interactive generation) were deployed before. Douglas Adams’ Starship Titanic is perhaps the first game where the player chats with the game’s characters (also robots).
However, the level of "understanding" of the Spookitalk engine powering the game is very different from what we target in this proposal. To quote Wikipedia, “Spookitalk had the ability to converse with the player in an almost lifelike manner, partially because it incorporated over 10,000 different phrases, pre-recorded by a group of talented voice actors. The recorded phrases would take over 14 hours to play back-to-back.”
In our game, the response will not be pre-recorded, but rather a result of a parsing, reasoning on a fact base, and generation (which means turning a logic formula back into English).
What's the single most demanding step in the game's language processing, in terms of how much power it needs from the game's processors and servers?
It depends how you define processing. We do dialog, so it's the full pipeline, from parsing, to reasoning to dialogue management and generation. In dialog management, we can ask for clarifications, or simply say "Did you mean X?". Parsing and reasoning are the most demanding.
How will the Bot Colony robots be able to learn information the player tells them? Will they just repeat things they've been told, or will they process the information and react to it in new ways?
It's definitely the latter. Any AI systems works with rules. So if, for example, a robot knows that a person needs to eat in order to survive, and we give it the fact "Todd is a person" and then ask "Why does Todd eat?", we'll get "Todd eats in order to survive." We'll also know that Todd sleeps, needs to protect himself from the elements, and so on.
How does the game avoid repetition? Will it give different answers if the player asks the same question over and over?
If the player asks the same question, a robot will politely point out that he’s already answered it. He may offer to word the answer differently, in case the player didn’t understand it the first time around…
How many different robots will be in the game? Will each one have a unique personality or a different system for recognizing speech?
There are many kinds of robots. Kiosk robots that welcome you at the airport or at the hotel, android robots that work in the restaurant or in the bar and serve you food or drink, manufacturing robots, diving robots, Mech-soldiers, mining robots, personal trainer robots, horti-robots, camera-bots, and so on.
Personality is a huge area. While it’s easier to change the appearance or even the voice of a robot, the more subtle aspects like way of speaking, vocabulary, body language are much more difficult, and we’ll be working on them in the years to come.
What were the visual inspirations behind Bot Colony's robots? Are you leaning more toward human-like androids or a mixture of various machines, like Star Wars droids?
The rule is that robot design is dictated by robot function. Robots that interface with people are androids, so that people would be more at ease. Robots that work in mines, oil rigs, manufacturing, defense or agriculture have a different shape.
Are there any human characters in the game, or is the cast all robots? Did you decide to make them robots for any specific reason?
There is an elusive spy, and the player is trying to catch him. However, the player will interface only with robots. Yes, there is a very specific reason for having the player only interface with robots – and that’s suspension of disbelief. While it’s OK for a robot not to understand what the player is saying, this is not acceptable coming from a human. Getting through to a robot is part of the fun in playing Bot Colony.
The game will allow both voice-based and text-based interfaces, correct? If so, is the text-based interface more elaborate than the voice-based one?
Elaborate for who? Maybe for the player. First of all, speech-based or typing-based both produce the same thing, meaning ASCII text. For us, a typing interface will of course be easier, since there are no “Ahs, “Ohs,” and fewer expletives, redundant words ("like" inserted everywhere), and so on. Also, when the player speaks, we have to insert punctuation (she or he won't say '"question mark" at the end of a sentence, which would actually be really convenient for us).
Is Bot Colony designed to give the player a "Game Over" at any point, or is it like an adventure game where players are stymied only when they can't figure out how to solve a puzzle?
In Bot Colony, the clock keeps running and time advances (so from that point of view, it is similar to The Last Express that you mentioned). If the player keeps exploring the island and chatting with robots (who will be delighted to oblige), and does nothing proactive to accomplish the mission, I guess the spy will eventually succeed in destroying the island, or the player will eventually get fired by Nakagawa Corp.
The player's score will, of course, go up based on levels successfully played. However, in Bot Colony there is another very important metric: successful verbal exchanges with robots, and their difficulty. If a player manages to get a robot to comprehend something new, she or he gets rewarded.
How long do you estimate the final game to be in terms of the player solving the mystery? Since it's a "sandbox" game, how many different environments will the game's island include?
Gameplay could last several hours. There’s a lot of fun to be had just chatting with the robots of Bot Colony. In the released version, there should be more than 20 playable levels.
How many players will the final game be able to support simultaneously?
This is an issue of economics. Bot Colony (actually, understanding language in real-time) requires huge computing resources. We’re talking about several quad-core processors and many Gigabytes of memory to support a few hundred players.
You mentioned that you'll be working on the game "in the years to come." Any idea on when a retail version might be ready, or does that depend on what happens in the beta-testing phase?
Bot Colony will be released in stages. There will never be a "perfect" Bot Colony, since there is no end to the sophistication that can be achieved in a conversation. We hope to start a restricted beta in the summer, and perhaps release an initial version by December 2009.