There’s a moment early in Enslaved: Odyssey to the West when it’s easy to forget that Monkey and Trip are digital characters in a videogame.
It’s after the crash of the slave ship, when Monkey wakes to discover that he’s wearing a slave headband. As he stirs, Trip, the young technical genius, sits a short distance away, thin arms wrapped around skinny knees. She watches him with wide eyes. She’s not sure if the headband she’s hacked will work the way she expects, and she’s equally unsure how the lithe and powerful Monkey will respond to being under her control.
Trip doesn’t say any of this. There’s no dialogue to tell us what’s on her mind. We know what she’s thinking because we can read it on her face and can see it in her body language. The same way we can sense Monkey’s confusion by how his mouth twists, and his suspicion by the way his eyes narrow when he looks at Trip.
“So much of our communication is non-verbal,” explains Tameem Antoniades, chief creative ninja at Ninja Theory and the designer of Enslaved. “We can tell whether you’re lying or whether I’m telling the truth just by looking at each other’s eyes and faces.”
Good storytelling in a visual medium like film or videogames, Antoniades suggests, occurs when characters say one thing, but the audience knows they mean something else. It’s the “show, don’t tell” storytelling mantra that makes the intent behind the words most important. Meaning and emotion are conveyed not through dialogue, but through facial and body movements. A curved lip suggests passion, a furrowed brow indicates fear, a raised arm, anger.
We’ve come to expect the bodies of our videogame characters to move the way real people move, but as we’re seeing in games like Enslaved, the faces of these characters – and therefore their emotions – are also now being rendered with incredible realism. And it’s all thanks to a bunch of white dots and some glue, which are the key components used in motion capture.
Motion capture has been in use since the late 1970s, when it was largely relegated to kinesiology departments on university campuses. It’s somewhat of an oversimplification, but motion capture essentially involves a camera recording an image of the body in motion, which is then translated into digital information by computer software, converting those little white dots into the angles, speed, and trajectory of a body’s movement. That information can then be used to animate a digital skeleton on a computer. Put skin over that skeleton, and you’ve got a digital character.
Within a few years, the technology of being able to translate real movement into a digital environment was being used in the entertainment industry – movies and videogames – to create more realistic computer-generated (CG) characters. The amount of movement being tracked then was fairly basic. Improving technology meant that more dots could be placed on bodies, leading to even more realistic animations. The next step was putting dots on the faces of the actors, to record the fine movements of things like lips, eyebrows and cheek muscles.
Capturing facial movements was one of the things that made Andy Serkis’ Gollum – in Peter Jackson’s 2001 film The Lord of the Rings: The Fellowship of the Ring – so memorable. When Serkis first became Gollum, he acted the role on set to establish Gollum’s body movements, and again in a small booth, to capture the facial expressions. Digital artists then matched the digital face to the digital body.
Antoniades tells me that when he saw Gollum, something clicked. “I thought if you can make a genuine performance in a CG character in a movie, then it should be possible to do it in a game.” It so happened that Antoniades’ brother was working on Serkis’ mortgage and introduced the actor to the game designer. Serkis then introduced Antoniades to Weta Digital, the visual effects company in New Zealand that helped create Gollum. “We went to Weta with a statement of intent,” says Antoniades, “‘How do we create characters that are believable, in real time?'”
Ninja Theory’s technical artists spent a year in Wellington to adapt the movie processes for game development, and when the studio started work on Heavenly Sword, Antoniades decided that Weta would be where the motion capture would take place. Serkis was an easy choice to star in the game – also starring was Anna Torv (Fringe). Serkis directed the motion capture sequences.
While facial expressions had traditionally been captured separately from body movements and voice acting, Antoniades wanted to capture them all at the same time. “We wanted to do the entire shoot that way where we were not just capturing each element separately,” he says, “we were capturing it all together with multiple actors.”
It was the first time that all elements of an actor’s performance had been captured simultaneously for a videogame. Antoniades admits that at the time, they weren’t sure how it would end up because the final animations that use the motion capture take up to a year to create. “We were totally working blind,” says Antoniades, “but I think the results were better than we expected. It proved to us that you can have subtlety and emotion and you can engage players emotionally through the expressions of CG characters.”
Capturing body, face and voice at the same time has since become standard practice in film and game design and is referred to as “performance capture.” James Cameron used performance capture in creating Avatar. At Comic-Con in 2009, the director said that the technology empowers actors because it frees them from their physical body. “Actors don’t do motion,” he said, “they do emotion.” And performance capture, according to him, preserves every moment of an actor’s creation on set.
Peter Jackson, on stage with Cameron at Comic-Con, added: “It’s an extension of the makeup process… It’s not hugely different than what Lon Chaney did.”
Movie makers and videogame creators are using the exact same method to tell stories, says Antoniades. Cameron, Jackson and Steven Spielberg all visited the Heavenly Sword and Enslaved sets, just as Antoniades visited the Tintin and Rise of the Apes sets. They compared notes on how they were doing performance capture and what they were doing with the data afterwards. “Everyone’s going about it in their own particular way,” says Antoniades. “Filmmakers go about it in a way that is much more similar to traditional filmmaking. We don’t know anything about filmmaking so we do it in a totally different way, but it’s still the same result. It’s trying to tell stories in a virtual way.”
Performance capture has become a way for videogames to be like film in terms of showing a story, instead of telling it. Antoniades says that the words in the script don’t mean much because the dramatic intent of the scene is what’s important. Mid-way through Enslaved, Pigsy tries to find out the nature of the relationship between Monkey and Trip. “I always sort of fancied my chances myself with Trip,” says Pigsy. “If you want to take your chances, go ahead,” Monkey replies. Trip says nothing.
The dialogue is straightforward, but the dramatic intent of the scene is to create tension between the three characters. Monkey is sarcastic, Pigsy is awkward and embarrassed, and Trip is embarrassed for Pigsy and furious with Monkey. “That whole scene… is about eye movements and expressions,” says Antoniades. “That’s not something you can do without performance capture.”
For EA’s Fight Night Champion, performance capture made it possible to inject a movie-like narrative component into what used to be a pure sports sim.
Brian Hayes is a creative director and lead gameplay producer on the game, which was just released for PS3 and Xbox 360. Champion Mode, he explains, is a story that puts players in the shoes of Andre Bishop, a young boxer struggling to become a champion fighter. Former SlamBall athlete LaMonica Garrett plays Bishop, while Eliza Dushku (Buffy the Vampire Slayer, Angel, Dollhouse) takes a turn as the daughter of a shady fight promoter.
Hayes, who was responsible for overseeing the script and cinematics for Champion Mode, says that performance capture was essential to getting the level of quality they wanted in the animations. There’s no point in creating interesting characters and environments if those characters are just going to stand there like puppets with mouths wagging, he explains. Performance capture helps deliver compelling scenes: “It makes it seem much more like you are engrossed in a real movie.”
Real people don’t just talk with their mouths, though; they use their faces, their hands, and their bodies. In creating a cinematic scene with dialogue and character interaction, Hayes says, being able to record body and facial movements together is essential. “It means that when we recreate the scene in the game, everything looks human.”
Enslaved is a particularly good example. The sequence after the crash of the slave ship is only the first of many interactions between the characters in which meaning and emotion can be understood simply from facial expressions. “Now we have the confidence to say that anything you can do in a movie you can do in a videogame in terms of acting performance and subtlety,” says Antoniades.
The next advancement for performance capture, says Antoniades, is to find ways to introduce more interactive, branching narratives. Enslaved was a linear, straightforward story, but by combining the realistic characters that are possible as a result of performance capture with more complex plots, players will be able to more completely empathize and become even more immersed in the games they play.
Blaine Kyllo writes on videogames, technology and culture. He lives in Vancouver.