Cutting Corners: Networking Design in JourneyBy Nick Clark on 03/12/14 11:32:00 am
Laziness is one of the chief virtues of a developer, according to Perl inventor Larry Wall. He playfully defines laziness as going to great effort to reduce overall energy expenditure. True to this virtue, thatgamecompany was laser-focused on laziness when we began our first multiplayer project, Journey. For a small team, creating an anonymous, seamless online game was an ambitious goal. Players begin a solitary adventure and are automatically paired together creating the opportunity for a shared experienced.
Standard multiplayer features such as lobbies and chat boxes were already absent from our initial designs, but we would still have to streamline the networking implementation to stand a chance at finishing the project. We knew that latency and non-determinism would compound problems as the usual single-player design hacks unravel when unpredictable remote players are involved. With this in mind, we outlined five goals for our networking strategy:
• online play indistinguishable from offline,
• smoothness trumps accuracy,
• minimize griefing,
• connect as many players as possible, and
• unfortunately, designers are also responsible for networking.
My role as one of our designer-engineer hybrids afforded me a unique view of our networking development process. I hope sharing a few of our decisions and discoveries in this technically-focused article will be useful.
I. Online play indistinguishable from offline
We began with confidence when tackling our first networking challenge, avatar synchronization. The player must not be aware of the moment the game shifts from a solitary offline mode to a connected online mode as they are paired with another journeyer. There should be no detectable change in the responsiveness of the controls or in the way the environment reacts. We went for the sleekest, simplest implementation we could imagine.
Seamless switching between single and multiplayer modes proved to be straightforward with a peer to peer shared authority model of control. Each player has final say over what their avatar is doing, rather than having an authoritative server orchestrating player movement. Every net update step, each player simply informs the other of their avatar state at that moment. As local control code never changes between solo and networked play, the only remaining task is to make the remote avatar appear as though they are moving smoothly. We heaped interpolation onto various aspects of the remote avatar and sent controller input to perform a small amount of predictive movement.
In addition to the avatars, we extended the shared authority model to AI-controlled creatures in the environment. In various areas of the game, each player is authoritative over the simulation for creatures capable of affecting their own avatar. We often resorted to tricks such as spawning pairs of creatures, one designated to interact with each player. The creature interested in your avatar is also simulated locally leading to responsiveness identical to offline play. During solo play the remote-designated creature adopts the same behavior as the local creature. Troubling edge cases arose during disconnection from the remote player; remote-controlled creatures in the middle of a complex maneuver often became irreparably bugged.
II. Smoothness trumps accuracy
Spurred by our initial progress, engineering and design forged ahead building visually appealing simulations of the remote avatar and creatures based on the shared authority system. However, our idealism faded as we began to discover the practical tradeoffs inherent in our simplistic model. Perhaps constructing the online component wouldn’t be as easy as we hoped.
Smearing inputs and outputs lacks artistry from an engineering perspective, yet is often an effective method of achieving smoothness. Unfortunately, because of our heavy-handed interpolation, when the other player deftly maneuvers around a pillar or other obstacle, locally it often appears as though they are kind of drunkenly rubbing along it. In certain rare situations, the simulated remote avatar would become stuck in a corner since interpolation badly approximates acute movements. Our eventual solution involved literally cutting corners. As a final fail-safe, the remote avatar will teleport to prevent a game-breaking situation if their reported position becomes too dissimilar from the interpolated position. Thankfully, there have yet to be any Journey stories highlighting this phenomenon.
Our unscientific method of synchronizing the player amounted to sending a bevy of properties and essentially feeding them into a copy of the avatar. We ran into numerous bugs in which avatar behaviors stopped functioning because one key piece of data was not sent properly. The most frequent (and funny) bugs arose from the animation, such as the remote character being stuck in a seated position while flying. We debugged these synchronization issues through time and persistence, though the primitive manual state dump worked in the designers’ favor as it was easy to prototype new bug-ridden mechanics by modifying the avatar net structure and sneaking in another 2, 4, (Oh why not?) 16 bytes of data.
Without a central authority or structured time-stepping of avatar state, we discovered that mechanics requiring explicit coordination between players became impossible. We prototyped several leap-frogging mechanics in which one player runs behind the other and "drafts" to build up speed, eventually flying past and becoming the new leader for a few seconds until the process repeats itself. At low latencies this worked fine, however at average latencies, along with the liberal interpolation, both players rapidly arrive at a situation where they each believe they are the leader and all drafting effects stop.
After numerous attempts to fix drafting, we gave up and settled on a much simpler mechanic of granting movement energy whenever the remote player is within some general radius of the local avatar. Though leap-frogging was fun when it worked, the radial glow proved superior as it is more flexible and expressive by not forcing players to move in a line for maximum speed.
III. Minimize griefing
Tension built as we hit dead-ends in our multiplayer progress. Not only did our early movement mechanics break, but more fundamental design issues arose with our initial resource-based mechanics as they proved incompatible with our approach to shared authority networking. We had to transform ourselves into griefers to understand the problems in our design. Our internal playtests shifted from having fun to figuring out how to ruin each other’s experiences.
Griefing is a real problem with online games and reducing it was central to our vision for how Journey could stand apart. Unfortunately, trusting the other player's reported state and being unable to modify the local player's experience leads to an inability to fairly resolve conflict. For Journey, grabbing a power-up or resource at the same moment as the other player shouldn’t result in disappointment (unlike competing for quad damage in Quake: "DENIED!" followed by a free teleport).
Our preliminary designs involved shreds of cloth that could be captured to grant flight energy; we found that players would compete to hoard and steal these assets from one another. To fix these potential race-condition letdowns, we reworked early design concepts and removed non-shared resources. Networked resources in the game are purely additive, meaning that if a player gets one, the effect applies for both players; for example, lighting the tombstones near the end of each level or touching sections of cloth to activate a puzzle piece.
Avatar collision detection is another form of conflict as one player can block the other’s progress. Preventing the other player from landing on a key ledge became a badge of griefing honor during our internal playtest sessions. Sticking with our rule to never modify the local player's experience, we first tried to fix the issue by disabling inter-avatar collision entirely. This almost worked, but looked inappropriate when players stood still on top of one another. Eventually, we decided to locally push the remote player’s avatar away in cases of extreme overlap.
IV. Connect as many players as possible
After months spent overhauling our original leap-frogging movement mechanics and resource-based designs, our relentless iteration began to pay off as we learned how to effectively utilize our lazy networking approach. It was easy to set out on a path of minimalistic multiplayer but difficult to appreciate the extent of the impact on our design. As we solidified avatar interactions, our focus shifted towards the higher-level problem of ensuring that players are able to reliably connect to one another.
World state synchronization is a concern when joining two player's games together. For simplicity we chose to avoid performing active reconciliation and instead only connect players who already agree on a view of the world. To accomplish this we represented world state as a per-level bit field that updates when the player activates a major event. Designers determined which gameplay sequences deserved a bit flag.
Our informal rule was that any event which modified the collision of the level needed to be a break in world state. Most levels ended up with only three or four flags. I found it a continual struggle to resist the allure of adding more world-altering events. There was no shortage of awesome ideas, but each one created additional fragmentation within the connection pool.
These bit flags worked well for major level changes, however dynamic entities such as AI creatures aren't necessarily on or off and still required state resolution. We wrote a variety of special-case logic to merge the creatures’ state after a new connection. This often involved each creature prioritizing moving quickly towards the agreed upon position of where it should be located. For certain exceptional cases where basic movement proved inadequate for resolving a creature's state, a world state flag was set and the change in that creature considered a major event. Merging creature states proved difficult and took a significant amount of work and debugging.
I believe this world state resolution problem had a large influence on the overall gameplay tone of Journey. Over time, our strong aversion to world changing events lead us to focus our efforts on reversible effects such as movement modifiers and ambient interactive elements. In short, it forced us to concentrate on the traveling aspect of gameplay.
In addition to reducing dependency on world state flags, designers integrated choke points into levels to increase the connection pool size. Along with world state, the Journey matchmaking system also checks physical proximity before connecting to a new player. Choke points create areas where the matchmaker has ample time to establish a new connection, as well as providing an opportunity for slower players to catch up with faster ones and avoid being disconnected for being too far away.
For legacy tool reasons, our levels are rectangular height-maps with the starting and ending positions generally on opposite sides of the map. This linearity worked in our favor as it increases the likelihood of players staying together versus wandering apart and disconnecting. Though we saw positive results when playtesting and even created tools to generate traversal heatmaps, we were never able to organize any solid data on exactly how effective our level design techniques were for enabling connections. Whether the game could have benefited from more or fewer choke points is a mystery to me, and much of our decision making in this area came primarily from instinct.
V. Designers are also responsible for networking
In consideration for our small team size, we decided that designers must have a large role in the implementation of online features. In addition to assisting with the workload, the design team would gain a stronger understanding of multiplayer advantages and limitations. To share networking responsibilities, designers were given direct networking support in our level editor with what we called a "branch" event. Our level editor lives in Maya and has a node-based scripting system that can chain together script processes to form complex sequences of events. Branch allows events to be activated locally, remotely, or for both players; it is a networked control flow mechanism.
Branch proved to be incredibly powerful and was used extensively throughout our level scripting as it allowed designers to prototype synchronized effects without diving into code. A recurrent pattern we used for world state modifying events was to assign a region to the branch. Upon local triggering of that event, if the remote player exists within this region they will also see the event unfold and their world will change in lockstep. If not inside the region, they will be disconnected. This let us hand-tune distance constraints to prevent one player from running far ahead and "stealing gameplay" by activating puzzle pieces in a way that would be confusing for the trailing player.
Unfortunately, our designer-friendly technology presented us with an unexpected amount of problems throughout alpha and beta. Though we made rapid early progress implementing synchronized sequences in the editor, the number of ad-hoc branch events scattered throughout the game began to catch up with us. Networked event sequences often had subtle race conditions or other infrequent bugs that were only solved through hours of exhaustive testing. Many of the complex sequences I created had evolved into a truly horrendous spaghetti of control flow, particularly in the underground level. Because so much logic was implemented directly in each level, there were no convenient global or reusable solutions; every scenario in every level required individual debugging and repair.
Journey's limited scope helped us test most event activation timings, however we did ship with a few race conditions discovered during playtests that we weren't able to resolve due to the extreme infrequency of occurrence. Fortunately, nothing I witnessed was game breaking and perhaps at worst resulted in temporary bizarre behavior of AI creatures or cutscenes. Apologies if this turned out not to be the case for any unlucky journeyers.
Wall’s virtue of laziness for me means to be calm; to resist the urge to over-architect and instead solve the problem at hand in a direct, straightforward fashion, if at all. Laziness became the laser that allowed us to cut through complexity and find the core of our designs. Being lazy wasn’t easy and required constant introspection as to whether we were actually traversing the shortest distance between two points. Simplifying the engineering generated positive design limitations but came at the cost of technical stability and predictability. During development the constant fear that the entire online experience could collapse at any moment was mostly overcome by another of Wall’s virtues, hubris. The process was both scary and empowering. Is laziness a universal virtue, or do sophisticated frameworks pay their dividends in the end?
As a final note, we are searching for new engineering talent to join us in designing and developing networking tools to expand the world of our upcoming game. Please contact us at [email protected] for more information. One of our Feel Engineers, John Nesky, will be presenting a talk at GDC next week on cameras; stop by and say hello!
Return to the full version of this blog
Copyright © UBM Tech, All rights reserved