When Jason Matheny joined the U.S. Intelligence Advanced Research Projects Activity (IARPA) as a program manager in 2009, he made a habit of chatting to the organization’s research analysts. “What do you need?” he would ask, and the answer was always the same: a way to make more accurate predictions. “What if we made you an artificially intelligent computer model that forecasts real-world events such as political instability, weapons tests and disease outbreaks?” Matheny would ask. “Would you use it?”

The analysts’ response was enthusiastic, except for one crucial caveat. “It came down to whether they could explain the model to a decision maker—like the secretary of Defense,” says Matheny, who is now IARPA’s director. What if the artificial intelligence (AI) model told defense analysts that North Korea was getting ready to bomb Alaska? “They don’t want to be in the position of thinking the system could be wrong but not being sure why or how,” he adds.

Therein lies today’s AI conundrum: The most capable technologies—namely, deep neural networks—are notoriously opaque, offering few clues as to how they arrive at their conclusions. But if consumers are to, say, entrust their safety to AI-driven vehicles or their health to AI-assisted medical care, they will want to know how these systems make critical decisions. “[Deep neural nets] can be really good but they can also fail in mysterious ways,” says Anders Sandberg, a senior research fellow at the University of Oxford’s Future of Humanity Institute. “People are starting to wake up to the realization that you just can’t trust this software completely.”

A growing number of researchers are taking steps to address this concern as AI begins to transform entire industries, including transportation, medicine, manufacturing and defense. A full-fledged fix is still years away but a number of promising plans are emerging. Some researchers test AI systems like scientists test lab rats, tinkering with the inputs to see how they affect behavior in hopes of illuminating the decision-making process. Others attempt to probe the networks’ behavior with additional nets or invent new programming languages to better control how these systems learn. The approaches may vary but their goal is the same: to ensure that our machines do not evolve too far beyond our ability to understand them.


What makes today’s deep neural nets at once powerful and capricious is their ability to find patterns in huge amounts of data. Loosely modeled on the human brain, these complex computing systems are the not-so-secret sauce of the current AI boom. They are why digital assistants like Apple’s Siri and Amazon’s Alexa have gotten very good at recognizing speech, and why Google translations are finally comprehensible. They also enable machines to identify images, predict diseases and beat humans at the television quiz show, Jeopardy!, and at go, a game arguably more sophisticated than chess.

Neural nets process information by passing it through a hierarchy of interconnected layers, somewhat akin to the brain’s biological circuitry. The first layer of digital “neurons”—called nodes—receives raw inputs (such as pixels in a photograph of a cat), mixes and scores these inputs according to simple mathematical rules, and then passes the outputs to the next layer of nodes. “Deep” nets contain anywhere from three to hundreds of layers, the last of which distills all of this neural activity into a singular prediction: This is a picture of a cat, for example.

If that prediction is wrong, a neural net will then tweak the links between nodes, steering the system closer to the right result. Yann LeCun, director of AI research at Facebook, likens this web of numerical connections to a box with millions of knobs. By tuning the knobs to satisfy millions of examples, the neural net creates a structured set of relationships—a model—that can classify new images or perform actions under conditions it has never encountered before.

That process, known as deep learning, allows neural nets to create AI models that are too complicated or too tedious to code by hand. These models can be mind-bogglingly complex, with the largest nearing one trillion parameters (knobs). “What’s cool about deep learning is you don’t have to tell the system what to look for,” says Joel Dudley, director of Biomedical Informatics at Icahn School of Medicine at Mount Sinai in New York City. “It’s just, ‘Here’s a few million pictures of cats. You figure out what a cat looks like.’”

This flexibility allows neural nets to outperform other forms of machine learning—which are limited by their relative simplicity—and sometimes even humans. For instance, an experimental neural net at Mount Sinai called Deep Patient can forecast whether a patient will receive a particular diagnosis within the next year, months before a doctor would make the call. Dudley and his colleagues trained the system by feeding it 12 years’ worth of electronic health records, including test results and hospital visits, from 700,000 patients. On its own Deep Patient then discerned hidden harbingers of illness. “We showed it could predict 90 different diseases, ranging from schizophrenia to cancer to diabetes, with very high accuracy—without ever having talked to an expert,” Dudley says.

Digital Subconscious

Because neural nets essentially program themselves, however, they often learn enigmatic rules that no human can fully understand. “It’s very difficult to find out why [a neural net] made a particular decision,” says Alan Winfield, a robot ethicist at the University of the West of England Bristol. When Google’s AlphaGo neural net played go champion Lee Sedol last year in Seoul, it made a move that flummoxed everyone watching, even Sedol. “We still can’t explain it,” Winfield says. Sure, you could, in theory, look under the hood and review every position of every knob—that is, every parameter—in AlphaGo’s artificial brain, but even a programmer would not glean much from these numbers because their “meaning” (what drives a neural net to make a decision) is encoded in the billions of diffuse connections between nodes.

Many experts find this opacity worrisome. “It doesn’t matter in the game of go, but imagine the autopilot of a driverless car,” Winfield says. “If there’s a serious accident, it’s simply not acceptable to say to an investigator or a judge, ‘We just don’t understand why the car did that.’” It is not hard to picture other problematic scenarios: An autonomous drone strikes a school; a loan-evaluation program disproportionally denies applications from minorities; a system like Deep Patient makes a specious diagnosis. “Getting a very complex system you don’t fully understand to behave itself is a profound problem,” Sandberg says.

Unmasking AI

Researchers are investigating a number of options in their search for solutions. One approach—called model induction, or the “observer approach”—treats an AI system like a black box. “You experiment with it and try to infer its behavior,” says David Gunning, who manages the Explainable Artificial Intelligence program at the Defense Advanced Research Projects Agency (DARPA). For example, by carving up an image of a cat and feeding a neural net the pieces one at a time, a programmer can get a good idea of which parts—tail, paws, fur patterns or something unexpected—lead the computer to make a correct classification.

Then there’s the surgical approach, “which lets us actually look into the brain of the AI system,” says Alan Fern, a professor of electrical engineering and computer science at Oregon State University, who is leading one of 12 projects funded under Gunning’s program. The trick is getting what he sees to make some kind of sense. “An honest-to-goodness explanation would trace every single firing of every node in the network,” creating a long, convoluted audit trail that is “completely uninterpretable to a human,” Fern says. To extract a more meaningful—if less exacting—explanation, Fern’s team proposes probing a neural net with a second neural net. This “explanation net” would learn which neural activity in the original model is most important in making a particular decision.

The hope is to create a prototypical neural net that will help build transparency into autonomous systems like drones or unmanned vehicles. “I don’t think we can ever have perfect confidence in any software,” Fern says, “but we can do a lot more to make sure the system is doing the right thing for the right reasons and isn’t doing something crazy in the background.”

That is also the goal of Bonsai, a start-up developing a new programming language called Inkling to help businesses train their own deep-learning systems to solve organizational problems such as city planning and supply chain logistics. “A lot of our customers have reservations about turning over decisions to a black box,” says co-founder and CEO Mark Hammond. Bonsai seeks to open the box by changing the way neural nets learn. Most of us, Hammond points out, do not learn—as today’s neural nets do—simply through trial and error. We are also taught—by parents, teachers, coaches and YouTube videos. We learn to play baseball, for instance, not by flailing a bat at fastballs until we hit one but via education: We are shown how to knock Wiffle balls off a tee and then to swing at easy lofts until we’re ready for a real pitch. All the while we are picking up the language of the game—language that lets us explain what we are doing and why.

Dudley, too, is experimenting with ways to explain Deep Patient’s predictions. He is less concerned about the system’s black-box nature, so long as the models it generates can be shown to be safe during clinical trials. (After all, he says, “a huge number of drugs we give are black boxes.”) Still, he is curious about Deep Patient’s rationale because he thinks it might help doctors better understand and treat disease. Although he cannot know why Deep Patient made a particular diagnosis, he can look for clusters of patients with the same diagnosis and calculate their similarities. This exercise has already turned up some surprising discoveries, including connections between diabetes and Alzheimer’s disease that predate Deep Patient. The diabetic drug Metformin, for example, seems to protect certain types of patients against Alzheimer’s—a correlation that is not obvious using standard statistics.

Show and Tell

Some experts, however, worry that such piecemeal efforts are not enough to ensure public trust in intelligent systems. In 2016 the European Union adopted new data-protection rules—which come into effect next year—that include a legal right to an explanation of decisions made by algorithms. Meanwhile a working group of the IEEE (Institute for Electrical and Electronics Engineers) is developing industry standards that could help define what “explanation” actually means. “Transparency is not one thing,” says Winfield, who is heading the working group. “What an elderly person needs to understand about her care robot is different from what a safety tester needs—or an accident investigator or a lawyer or the general public.”

Oxford’s Sandberg likes to joke that he knows the secret to why the U.S. became a superpower. “It’s because they have ‘show-and-tell’ in school,” he says. The quip may be tongue in cheek, but its moral is sincere: he adds, “Being able to explain things gives you a kind of power.”