Artificial intelligence (AI) is at an inflection point in health care. A 50-year span of algorithm and software development has produced some powerful approaches to extracting patterns from big data. For example, deep-learning neural networks have been shown to be effective for image analysis, resulting in the first FDA-approved AI-aided diagnosis of an eye disease called diabetic retinopathy, using only photos of a patient’s eye.

However, the application of AI in the health care domain has also revealed many of its weaknesses, outlined in a recent guidance document from the World Health Organization (WHO). The document covers a lengthy list of topics, each of which are just as important as the last: responsible, accountable, inclusive, equitable, ethical, unbiased, responsive, sustainable, transparent, trustworthy and explainable AI. These are all vital to the delivery of health care and consistent with how we approach medicine when the best interest of the patient is paramount.

It is not hard to understand how an algorithm might be biased, exclusive, inequitable or unethical. That could be explained by the possibility of its developers not giving it the ability to discern good data from bad, or that they hadn’t been aware of data problems because these often arise from discriminatory human behavior. One example is unconsciously triaging emergency room patients differently based on the color of their skin. Algorithms are good at exploiting these kinds of biases, and making them aware of them can be challenging. As the WHO guidance document suggests, we must weigh the risks of AI carefully with the potential benefits.

But what is more difficult to understand is why AI algorithms may not be transparent, trustworthy and explainable. Transparency means that it is easy to understand the AI algorithm, how it works and the computer code doing the work behind the scenes. This kind of transparency, in addition to rigorous validation, builds trust in the software, which is vital for patient care. Unfortunately, most AI software used in the health care industry comes from commercial entities who need to protect intellectual property and thus are not willing to divulge their algorithms and code. This likely results in a lack of trust of the AI and its work.

Trust and transparency are, of course, worthy goals. But what about explanation? One of the best ways to understand AI, or what AI aspires to be, is to think about how humans solve health care challenges and make decisions. When faced with challenging patients, it is common to consult other clinicians. This taps into their knowledge and experience base. One of the advantages of consulting a human is that we can follow up on an answer with the question of why.

Why do you think this treatment is the best course of action?”

Why do you recommend this procedure?”

A good clinician consultant should be able to explain why they arrived at a particular recommendation. Unfortunately, modern AI algorithms are rarely able to provide an answer to the question of why they think an answer is a good one. This is yet another dimension of trust, can help addresses the issues of bias and ethics, and can help the clinician learn from the AI, too, just as they would a human consultant.

How do we get to explainable AI? Interestingly, one of the earliest successful AI algorithms in health care was the MYCIN program developed by physician and computer scientist Edward Shortliffe in the early 1970s for prescribing antibiotics to patients in the intensive care unit. MYCIN was a type of AI called an expert system, which could answer the “Why?” question by backtracking through its probability calculations to tell the user how it arrived at an answer.

This was an important advance in AI, which we seemed to have lost in the search for better-performing algorithms. Explainable AI should be possible if the developer of the algorithm truly understands how it works. It’s simply a matter of putting the time and effort into keeping track of the algorithm as it iterates and presenting the path it took to an answer for the user in a human understandable form.

In other words, this should just be a matter of priority for the developer. Any AI algorithm that is so complex the developer cannot understand how it works is likely not a good candidate for health care.

We have made tremendous strides in the AI domain. We are all genuinely excited about how AI can help patients. We are also humbled by the failures of AI such as the recent study showing that AI results for diagnosing COVID-19 are unreliable as a result of biases in the data. We must keep the clinician in mind as we develop and evaluate AI algorithms for use in the clinic. We must constantly be thinking about what is good for the patient and how to garner the trust of the clinician using the software. The ability of AI to explain itself will be key.

This is an opinion and analysis article; the views expressed by the author or authors are not necessarily those of Scientific American.