For most of the past 30 years, computer vision technologies have struggled to help humans with visual tasks, even those as mundane as accurately recognizing faces in photographs. Recently, though, breakthroughs in deep learning, an emerging field of artificial intelligence, have finally enabled computers to interpret many kinds of images as successfully as, or better than, people do. Companies are already selling products that exploit the technology, which is likely to take over or assist in a wide range of tasks that people now perform, from driving trucks to reading scans for diagnosing medical disorders.
Recent progress in a deep-learning approach known as a convolutional neural network (CNN) is key to the latest strides. To give a simple example of its prowess, consider images of animals. Whereas humans can easily distinguish between a cat and a dog, CNNs allow machines to categorize specific breeds more successfully than people can. It excels because it is better able to learn, and draw inferences from, subtle, telling patterns in the images.
Convolutional neural networks do not need to be programmed to recognize specific features in images—for example, the shape and size of an animal’s ears. Instead they learn to spot features such as these on their own, through training. To train a CNN to separate an English springer spaniel from a Welsh one, for instance, you start with thousands of images of animals, including examples of either breed. Like most deep-learning networks, CNNs are organized in layers. In the lower layers, they learn simple shapes and edges from the images. In the higher layers, they learn complex and abstract concepts—in this case, features of ears, tails, tongues, fur textures, and so on. Once trained, a CNN can easily decide whether a new image of an animal shows a breed of interest.
CNNs were made possible by the tremendous progress in graphics processing units and parallel processing in the past decade. But the Internet has made a profound difference as well by feeding CNNs’ insatiable appetite for digitized images.
Computer-vision systems powered by deep learning are being developed for a range of applications. The technology is making self-driving cars safer by enhancing the ability to recognize pedestrians. Insurers are starting to apply deep-learning tools to assess damage to cars. In the security camera industry, CNNs are making it possible to understanding crowd behavior, which will make public places and airports safer. In agriculture, deep-learning applications can be used to predict crop yields, monitor water levels and help detect crop diseases before they spread.
Deep learning for visual tasks is making some of its broadest inroads in medicine, where it can speed experts’ interpretation of scans and pathology slides and provide critical information in places that lack professionals trained to read the images—be it for screening, diagnosis, or monitoring of disease progression or response to therapy. This year, for instance, the U.S. Food and Drug Administration approved a deep-learning approach from the start-up Arterys for visualizing blood flow in the heart; the purpose is to help diagnose heart disease. Also this year, Sebastian Thrun of Stanford University and his colleagues described a system in Nature that classified skin cancer as well as dermatologists did. The researchers noted that such a program installed on smartphones, which are ubiquitous around the world, could provide “low-cost universal access to vital diagnostic care.” Systems are also being developed to assess diabetic retinopathy (a cause of blindness), stroke, bone fractures, Alzheimer’s disease and other maladies.