When Nintendo's Wii game console debuted in November 2006, its motion-sensing handheld "Wiimotes" got players off the couch and onto their feet. Now Microsoft is trying to outdo its competitor by eliminating the controller altogether: It has revealed details of how it developed Project Natal, which gives Xbox 360 players the ability to manipulate on-screen characters via natural body movements.
The machine-learning technology will enable players to do things such as kick a digital soccer ball or swat a handball in their living rooms simply by mimicking the motion . "Instead of a controller, your body becomes the game input," says Alex Kipman, Microsoft's director of incubation for Xbox 360.
Microsoft introduced its ambitious Xbox upgrade in June 2009 and expects to ship the technology in time for the year-end 2010 holiday season. Natal will consist of a depth sensor that uses infrared signals to create a digital 3-D model of a player's body as it moves, a video camera that can pick up fine details such as facial expressions, and a microphone that can identify and locate individual voices.
Programming a game system to discern the human body's almost limitless combinations of joint positions is a fearsome computational problem. "Every single motion of the body is an input, so you'd need to program near infinite reactions to actions," Kipman says.
Instead of trying to preprogram actions, Microsoft decided to teach its gaming technology to recognize gestures in real time just like a human does: by extrapolating from experience. Jamie Shotton, a researcher at Microsoft Research Cambridge in England, devised a machine learning algorithm for that purpose. It also recognizes poses and renders them in the game space on-screen at 30 frames per second, a rate that conveys smooth movement. Essentially, Natal-enhanced Xboxes will do motion capture on the fly, without the need for the mirror-studded spandex suit of conventional motion-capture approaches.
Training Natal for this task required Microsoft to amass a large amount of biometric data. The firm sent observers to homes around the globe, where they videotaped basic motions such as turning a steering wheel or catching a ball, Kipman says. Microsoft researchers later laboriously selected key frames within this footage and marked each joint on each person's body. Kipman and his team also went into a Hollywood motion-capture studio to gather data on more acrobatic movements.
"During training, we need to provide the algorithm with two things: realistic-looking images that are synthesized and, for each pixel, the corresponding part of the body," Shotton says. The algorithm processes the data and changes the values of different elements to achieve the best performance.
To keep the amount of data manageable, the team needed to figure out which elements were most relevant for training. For example, the system doesn't need to recognize the entire body mass, but only the spacing of skeletal joints. After whittling down the data to the essential motions, the researchers mapped each unique pose to 12 models representing different ages, genders and body types.
The end result was a huge database consisting of frames of video with people's joints marked. Twenty percent of the data was used to train the system's brain to recognize movements. Engineers kept the rest in a "ground truth" database used to test the Natal's accuracy.
Choosing the best algorithm and sifting out the essential data are central to the art of machine learning. To test Natal's ability to recognize poses, engineers show it an image from the ground truth and then generate a digital pixel map in which the greater the computer's certainty is of a pixel being correctly placed on the body, the brighter the pixel is. Engineers test hypotheses about how to improve the performance, trying to fine-tune weak areas without regressing strong ones. The more accurately the system can recognize gestures, the more fun it will be to play the game.
Of course, Microsoft is not the only gaming company exploring gestural interfaces. Rival console-maker Sony in May demonstrated a prototype Interactive Communication Unit (ICU) with stereo video cameras and depth sensors at the Vision 2009 trade fair in Stuttgart, Germany, according to New Scientist. Sony developed ICU with the help of Atracsys, LLC, a Swiss firm that specializes in optical-tracking technology. Although Sony makes the popular PlayStation game console, the company says it is planning to promote its new technology only within the advertising industry rather than in the gaming market at this time. Rival Nintendo has not revealed any plans to allow its Wii system to function without the need for Wiimote.
Still, the controller should not disappear altogether, says Hiroshi Ishii, head of the Tangible Media Group at the Massachusetts Institute of Technology Media Lab. "I'm a strong believer of having something tangible in your hand," he says. Wiimote devices, moreover, provide haptic feedback, such as vibration or resistance that makes the action more realistic. Even for activities like Natal's soccerlike Ricochet game demo, he points out, a player might miss the simulated feeling of connecting with a physical object that a controller provides.
But Peter Molyneux, creative director of Microsoft Game Studios Europe, looks forward to a new breed of computer entertainment, because eliminating game controllers opens up more creative possibilities. "Natal is forcing me as a designer to think of this as a relationship between the player and a piece of technology," he says. "We're trying to make something that feels as if it's alive."