Visionary Research: Teaching Computers to See Like a Human

M.I.T. researchers are harnessing computer models of human vision to improve image recognition software

Join Our Community of Science Lovers!

For all their sophistication, computers still can't compete with nature's gift—a brain that sorts objects quickly and accurately enough so that people and primates can interpret what they see as it happens. Despite decades of development, computer vision systems still get bogged down by the massive amounts of data necessary just to identify the most basic images. Throw that same image into a different setting or change the lighting and artificial intelligence is even less of a match for good old gray matter.

These shortcomings become more pressing as demand grows for security systems that can recognize a known terrorist's face in a crowded airport and car safety mechanisms such as a sensor that can hit the brakes when it detects a pedestrian or another vehicle in the car's path. Seeking the way forward, Massachusetts Institute of Technology researchers are looking to advances in neuroscience for ways to improve artificial intelligence, and vice versa. The school's leading minds in both neural and computer sciences are pooling their research, mixing complex computational models of the brain with their work on image processing.

This cross-disciplinary approach began to yield fruit a year ago, when a group of researchers led by Tomaso Poggio, an investigator at M.I.T.'s McGovern Institute for Brain Research and a professor in the school's Department of Brain and Cognitive Sciences, used a brain-inspired computer model to interpret a series of photographs. Although the neurological model had been developed as a theoretical analysis of how certain visual pathways in the brain work, it turned out to be as good as, or even better than, the best existing computer vision systems at rapidly recognizing some complex scenes. Previously, when a computer was shown pictures of a horse, along with other animals standing in a forest and asked to identify the equine each time, it was swamped by all the data that might distinguish the horse from the other animals or the trees.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

When the neurological model was used, it was the first time a computer model was able to reproduce human behavior on that kind of task, Poggio says, and it brought the researchers closer to understanding how the visual cortex recognizes objects and scenes.

Some car companies have for years been trying to develop computer systems that allow their vehicles to identify pedestrians and other vehicles amidst a crowded background and provide drivers with a warning if they get too close. This type of recognition is very easy for humans, Poggio says, but "we're not conscious of what goes on in our head[s] when we do this."

When a person is shown a picture, even for just a fraction of a second, the brain's visual cortex recognizes what it sees immediately. The visual cortex is a large part of the brain's processing system and one of the most complex. Poggio says that understanding how it works could be a significant step toward knowing how the whole brain operates. "Vision is just a proxy for intelligence," he says. The human brain is much more aware of how it solves complex problems such as playing chess or solving algebra equations, which is why computer programmers have had so much success building machines that emulate this type of activity.

Thus far, Poggio's research has modeled "feedforward" vision, which occurs when an image is first presented to the eye. He and his colleagues are now looking to develop new models that help them better understand how the brain works once the eye begins to scan the scene portrayed in an image and interpret spatial relationships among objects in the scene. The hope is that this will ultimately lead to computer software that can do the same thing and eventually explain not only rapid cognition by humans but also other aspects of our visual intelligence. Keep your eyes peeled.

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American