Deep-Learning Networks Rival Human Vision

AI now matches or exceeds the ability of experts in medicine and other fields to interpret what they see

Join Our Community of Science Lovers!

For most of the past 30 years, computer vision technologies have struggled to help humans with visual tasks, even those as mundane as accurately recognizing faces in photographs. Recently, though, breakthroughs in deep learning, an emerging field of artificial intelligence, have finally enabled computers to interpret many kinds of images as successfully as, or better than, people do. Companies are already selling products that exploit the technology, which is likely to take over or assist in a wide range of tasks that people now perform, from driving trucks to reading scans for diagnosing medical disorders.

Recent progress in a deep-learning approach known as a convolutional neural network (CNN) is key to the latest strides. To give a simple example of its prowess, consider images of animals. Whereas humans can easily distinguish between a cat and a dog, CNNs allow machines to categorize specific breeds more successfully than people can. It excels because it is better able to learn, and draw inferences from, subtle, telling patterns in the images.

Convolutional neural networks do not need to be programmed to recognize specific features in images—for example, the shape and size of an animal’s ears. Instead they learn to spot features such as these on their own, through training. To train a CNN to separate an English springer spaniel from a Welsh one, for instance, you start with thousands of images of animals, including examples of either breed. Like most deep-learning networks, CNNs are organized in layers. In the lower layers, they learn simple shapes and edges from the images. In the higher layers, they learn complex and abstract concepts—in this case, features of ears, tails, tongues, fur textures, and so on. Once trained, a CNN can easily decide whether a new image of an animal shows a breed of interest.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


CNNs were made possible by the tremendous progress in graphics processing units and parallel processing in the past decade. But the Internet has made a profound difference as well by feeding CNNs’ insatiable appetite for digitized images.

Computer-vision systems powered by deep learning are being developed for a range of applications. The technology is making self-driving cars safer by enhancing the ability to recognize pedestrians. Insurers are starting to apply deep-learning tools to assess damage to cars. In the security camera industry, CNNs are making it possible to understanding crowd behavior, which will make public places and airports safer. In agriculture, deep-learning applications can be used to predict crop yields, monitor water levels and help detect crop diseases before they spread.

Deep learning for visual tasks is making some of its broadest inroads in medicine, where it can speed experts’ interpretation of scans and pathology slides and provide critical information in places that lack professionals trained to read the images—be it for screening, diagnosis, or monitoring of disease progression or response to therapy. This year, for instance, the U.S. Food and Drug Administration approved a deep-learning approach from the start-up Arterys for visualizing blood flow in the heart; the purpose is to help diagnose heart disease. Also this year, Sebastian Thrun of Stanford University and his colleagues described a system in Nature that classified skin cancer as well as dermatologists did. The researchers noted that such a program installed on smartphones, which are ubiquitous around the world, could provide “low-cost universal access to vital diagnostic care.” Systems are also being developed to assess diabetic retinopathy (a cause of blindness), stroke, bone fractures, Alzheimer’s disease and other maladies.

Apurv Mishra, an inventor and TED fellow, is chief technology officer at doc.ai--an artificial-intelligence company focused on health care. Previously he was COO of Datawallet, founder of Glavio Wearable Computing and vice president of Hypios. He has served on the World Economic Forum’s Global Agenda Council on Emerging Technologies. He holds a master’s degree in technology policy from the University of Cambridge.

More by Apurv Mishra

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American

Subscribe