When Nintendo's Wii game console debuted in November 2006, its motion-sensing handheld "Wiimotes" got players off the couch and onto their feet. Now Microsoft is trying to outdo its competitor by eliminating the controller altogether: It has revealed details of how it developed Project Natal, which gives Xbox 360 players the ability to manipulate on-screen characters via natural body movements.
The machine-learning technology will enable players to do things such as kick a digital soccer ball or swat a handball in their living rooms simply by mimicking the motion . "Instead of a controller, your body becomes the game input," says Alex Kipman, Microsoft's director of incubation for Xbox 360.
Microsoft introduced its ambitious Xbox upgrade in June 2009 and expects to ship the technology in time for the year-end 2010 holiday season. Natal will consist of a depth sensor that uses infrared signals to create a digital 3-D model of a player's body as it moves, a video camera that can pick up fine details such as facial expressions, and a microphone that can identify and locate individual voices.
Programming a game system to discern the human body's almost limitless combinations of joint positions is a fearsome computational problem. "Every single motion of the body is an input, so you'd need to program near infinite reactions to actions," Kipman says.
Instead of trying to preprogram actions, Microsoft decided to teach its gaming technology to recognize gestures in real time just like a human does: by extrapolating from experience. Jamie Shotton, a researcher at Microsoft Research Cambridge in England, devised a machine learning algorithm for that purpose. It also recognizes poses and renders them in the game space on-screen at 30 frames per second, a rate that conveys smooth movement. Essentially, Natal-enhanced Xboxes will do motion capture on the fly, without the need for the mirror-studded spandex suit of conventional motion-capture approaches.
Training Natal for this task required Microsoft to amass a large amount of biometric data. The firm sent observers to homes around the globe, where they videotaped basic motions such as turning a steering wheel or catching a ball, Kipman says. Microsoft researchers later laboriously selected key frames within this footage and marked each joint on each person's body. Kipman and his team also went into a Hollywood motion-capture studio to gather data on more acrobatic movements.
"During training, we need to provide the algorithm with two things: realistic-looking images that are synthesized and, for each pixel, the corresponding part of the body," Shotton says. The algorithm processes the data and changes the values of different elements to achieve the best performance.
To keep the amount of data manageable, the team needed to figure out which elements were most relevant for training. For example, the system doesn't need to recognize the entire body mass, but only the spacing of skeletal joints. After whittling down the data to the essential motions, the researchers mapped each unique pose to 12 models representing different ages, genders and body types.



See what we're tweeting about




9 Comments
Add CommentI just realised the foolproof way of getting rid of creationist and anti-climate-change junk comments : I won't bother to read any comments at all in future!
Reply | Report Abuse | Link to thisThis may be a pity for those who post useful comments, but just reading the articles is good enough for me.
As with speech recognition software, the issue that in some areas of life people require VAST freedom and (quite naturally) have low tolerance for error. Speech, and I suspect "movement" recognition will be such a matter, and I don't think current technology can deliver as needed.
Reply | Report Abuse | Link to thisThis tech also has implications in the computing field. People are already working on tech that will allow us to operate computers merely with our own gestures. If the system can recognize fingers and facial expressions and not just limbs, it could be used to work on computers that project images on glasses and are manipulated purely by our own gestures. Imagine the computer in minority report being projected onto the inside of your glasses. That's the future.
Reply | Report Abuse | Link to thisNathaniel, this doesn't have much to do with gestural interfaces. Using sonic transducers or various wavelengths of light seems to be far more presice where our fingers and hands are concerned. The need to model the rest of the body is just wasteful in terms of R&D (see the amount of biometric data collected) and implementation. Lets not forget that a crucial part of this whole equation lies in the ability of the computer (360 or otherwise) to process this data rapidly enough to satisfy consumers. I would refer again to tolerence for error (a lack thereof) in certain arenas as a major stumbling block. There is also the matter that this for GAMING, and there is simply no way that this will be a true 1:1 motion experience once its integrated into gameplay. The calculations that would need to be made on the fly for collison-detection aloine are massive, and playing with balls is probably already stretching the limits. You're not going to see authentic "sim fencing" as a result of this tech.
Reply | Report Abuse | Link to thisnon 1:1 motion control is just a gimmick, and seperate from the need for near 100% precise gestural interfaces.
Nothing new here. I remember visiting an VR expo in the mid 80's, where in one exhibit you could interract and play with a virtual ball using your body, arms, legs to punch, kick the ball. The system used a camera to include your real image in the virtual game. position detection used infra-red devices. I suspect Microsoft using brute force on a problem solved for years. Well, if they can make it cheap, could be interesting.
Reply | Report Abuse | Link to thisPet2001 is an idiot. If this tech was available since the 80's then it would have already been integrated into a gaming product. You clearly have no concept of what it takes to create, develop and produce a product in the real world.
Reply | Report Abuse | Link to thisref.: joebob99
Reply | Report Abuse | Link to thisMy mistake, i think the VR expo was in late 80's / early 90's. And no, no one tried seriously to integrate the tech on gaming products for number of reasons. It was too expensive at the time, electronics where far less advanced, and these techs where for rich VR freaks and army, not exactly gamers. Albeit there was some sort of consumer VR products wave (VR helmets with position sensors, 3D shutter glasses, 3D LCD screen glasses, haptic gloves...) that followed and failed.
I'm estimating failure and dissapointment just like most Microsoft innovations. They just don't seem to be able to bring practical innovation to the market like Apple. The company is littered with too many acedemics and not enough practical thinkers. Too bad...
Reply | Report Abuse | Link to thisDear Mr Fish,
Reply | Report Abuse | Link to thisJust out of curiosity, exactly what are these amazingly great innovations that Apple have produced? We all know they ripped off Xerox for their user interface. They even stole their name from the Beatles.
And, funny enough, the 'multitouch' aspects of the iPhone were first shown by that non-innovative company Microsoft in their 'Surface' product four years ago.
I assume you mean the exploding batteries. Nobody else has thought of that yet and Apple are still the leaders in that market.