The Avatar Effect

Interest in performance capture is skyrocketing thanks to the record-breaking film: so which workflow works the best for your production? 

Everyone knows Avatar was a game changer for 3D production and exhibition, but the real story behind why those 10-ft blue Navis broke box office records is still being written: Avatar’s ground-breaking use of performance capture allowed filmmaker James Cameron to create a workflow where he could see the planet Pandora, its inhabitants, as well as all his actors in the motion capture volume in one startling blend of creativity and digital technology. The film’s success, and the technology behind it, has pushed interest in performance capture to an all-time high, spurring on competing and complementary methodologies, both in development and being refined for use in a newly streamlined production paradigm that has the potential to alter the post-heavy focus that character animation has always required.

So which of these various workflows works the best? Facial markers, which delineates actor features and are recorded as a cloud of data by cameras, has been the methodology of choice for filmmaker Robert Zemeckis, who co-founded the San Rafael-based ImageMovers Digital (The Walt Disney Company recently announced they will shutter IMD by January 2011) for last year’s 3D mo cap feature A Christmas Carol, as well as Zemeckis’ early efforts in new technology, dating all the way back to 2003 and the first-ever all motion capture feature The Polar Express. Other popular workflows include motion capture pioneer Vicon’s optical tracking systems and marker-free video-based ones, while Mova, which figured prominently in The Curious Case of Benjamin Button, captures the actual topology of subject faces with its Contour system, which requires a performer’s visage be covered with phosphorescent paint. The mo cap process known as gross body capture has been accomplished via markers or with sensor-equipped suits that obviate the need for optical tracking. The performer-based data generated from these capture sessions form the basis for the character animation, which due to recent technological innovations can now be displayed in real-time immediately after capture, much like a live-action take can be studied on playback – the so-called “Avatar affect.”

Software maker Autodesk, whose 3D character animation software MotionBuilder helped drive Avatar’s groundbreaking workflow, has become a major player in the mo cap juggernaut, but it didn’t start out that way. Autodesk product marketing manager Maurice Patel says, “MotionBuilder wasn’t initially envisioned as a real-time animation engine, probably because its earlier iterations were more limited. But now, with multicore architecture, advances in GPU processing and operating systems that let you load large datasets into memory, 3D scenes can be populated with somewhat credible characters and environs, featuring very specific lighting effect choices.” Patel says that large amounts of data can be displayed on devices configured like cameras, as was the case with the SimCam in Avatar, which looks like a camera and has a real-time 3D feed, providing filmmakers with enough elements to visualize as you would from a monitor or through an eyepiece. He cites real-time interaction with modern graphics systems, particle processes and 64-bit memory space as key innovations in the development of a real-time mo cap workflow such as Avatar employed, and compares performance capture to where the Digital Intermediate process was a decade ago.

“[Initially] DI was daunting in that it offered so many options down the line, but you still had to think about color and lighting on set,” Patel explains. “The DI allowed cinematographers and directors to think about these matters differently, and for many production professionals, it has ended up enhancing the filmmaking process.”

Motion Builder

Much the same thing appears to be going on in the performance capture medium. The ability for directors to get an immediate look at the scene with animated enhancements radically alters the longstanding model for production’s input on visual effects. “Post used to involve the filmmakers at something of a remove,” Patel continues. “And input was limited primarily to reviews and approvals. The art of filmmaking is the art of interaction, and the typical CG process for blockbusters, up till now, has been divorced from this long tradition of interaction, because it was all post, and more and more of a film was being determined downstream from production. This real-time [mo cap] tech brings it back to where the directors and actors are living, which lets them drive the CG content, rather than the reverse. In this way, building a VFX blockbuster can now be even more about leveraging the live talent you have. Hopefully this will become accepted as a step in the process towards building a seamless iterative workflow that encompasses all aspects of the production model.”

The team at Xsens Technologies is thinking equally outside the box, offering a motion capture solution, MVN [formerly known as Moven], based on suit-mounted MEMS [micro electromechanical systems] inertial sensors. The process uses no markers, so occlusion and misidentification are not factors, minimizing the need for data cleanup. “One way our process differs from most is that capture can happen anywhere and isn’t limited to taking place within a volume,” Xsens product manager Hein Beute states. “Our system works inside an automobile or while jumping from a plane, so maintaining critical lighting conditions isn’t an issue either.”

Motion Builder

Setup with the MVN workflow takes less than a quarter-hour, and data capture (from up to 23 body segments) can be exports to .BVH and .FBX motion capture formats. Realtime integration (via MotionBuilder) is another option, so on-set review is also possible, while Xsens’ recent partnership with facial capture vets Image Metrics – which utilizes machine vision tech for its proprietary marker-free system – has further expanded the possibilities for what can be accomplished in a single session for full performance capture. “Image Metrics’ video-based head camera system allows performers to move about freely during facial capture,” Beute elaborates, “so that fits nicely with our own approach to body capture.” London-based VFX house Double Negative is already using the companies’ paired tech for performance capture on several upcoming blockbusters.

XSens MVN

As in commercial advertising, the digital-based world of computer gaming often seeks out the most eye-catching imagery imaginable. In developing their L.A. Noire for Rockstar Games, Team Bondi needed a capture system that could allow for the speedy production of 2200 script pages while also delivering a cinematic look to the interactive adventure. They chose the Sydney, Australia-based Depth Analysis, which employed their MotionScan system to capture as much as fifty minutes of facial capture footage daily. “The system uses 32 HD cameras,” explains Oliver Bao, DA’s Head of Research, “and captures mannerisms and facial nuances as 3D models at rates up to 30 frames per second. The major differences between us and most other systems is that we don’t use markers or phosphorescent paint on the actors, and more importantly, there is no cleanup of data or need to animate details by hand. We’ve been building on the core technology since 2007, so we’ve had time to make an efficient core pipeline.”

Depth Analysis does not even employ character artists and animators. “The CPUs are doing the processing,” Bao admits, “so it’s a different way of creating animation. We can process up to 20 minutes of facial animation automatically per day – a figure limited strictly by the CPU at this point.” DA’s capture volume is roughly seven meters wide, and can be relocated as needed.

Depth Analysis Scan Room

Bao says the MotionScan workflow allows filmmakers to view performances from any angle and re-light freely without resorting to multiple setups. “The director can make his selects on the spot or we can use our secondary approach, set up in parallel, where video editors can be doing pre-selects. Whichever approach makes the most efficient use of time, so the unit can always keep shooting. When Mad Men director Michael Uppendahl came here, he was amazed by how fast and easy it is for actors to deliver so much script.”

For L.A. Noire, the Depth Analysis facial capture data was handed off to Team Bondi’s character artists for inclusion with their body work. “Once initialized via code, they can do a lineup of our head and lock that in,” Bao concludes, who adds that a larger volume for full body capture at higher resolution is also in the works.

By Kevin H. Martin