As organizations increasingly replace human decision-making with algorithms, they may assume these computer programs lack our biases. But algorithms still reflect the real world, which means they can unintentionally perpetuate existing inequality. A study published Thursday in Science has found that a health care risk-prediction algorithm, a major example of tools used on more than 200 million people in the U.S., demonstrated racial bias—because it relied on a faulty metric for determining need.

This particular algorithm helps hospitals and insurance companies identify which patients will benefit from “high-risk care management” programs, which provide chronically ill people with access to specially trained nursing staff and allocate extra primary-care visits for closer monitoring. By singling out sicker patients for more organized and specific attention, these programs aim to preemptively stave off serious complications, reducing costs and increasing patient satisfaction.

To compute who should qualify for this extra care, the algorithm’s designers used previous patients’ health care spending as a proxy for medical needs—a common benchmark. “Cost is a very efficient way to summarize how many health care needs someone has. It’s available in many data sets, and you don’t need to do any cleaning [of the data],” says Ziad Obermeyer, an assistant professor of health policy and management at the University of California, Berkeley, and lead author on the new study.

In this case, however, the researchers found this proxy arrangement did not work well because even when black and white patients spent the same amount, they did not have the same level of need: black patients tended to pay for more active interventions such as emergency visits for diabetes or hypertension complications. The paper examined the health care records of almost 50,000 patients—of whom 6,079 self-identified as black and 43,539 self-identified as white—and compared their algorithmic risk scores with their actual health histories. Black patients, the researchers found, tended to receive lower risk scores. For example, among all patients classified as very high-risk, black individuals turned out to have 26.3 percent more chronic illnesses than white ones (despite sharing similar risk scores). Because their recorded health care costs were on par with those of healthier white people, the program was less likely to flag eligible black patients for high-risk care management. When contacted about these results, eight of the top U.S. health insurance companies, as well as two major hospitals and the professional group the Society of Actuaries, declined to comment. The published study results did not name the algorithm or identify its developer.

The researchers suggested a few reasons for the cost disparity that caused this problem. For one, race and income are correlated: People of color are more likely to have lower incomes. And poorer patients, even when insured, tend to use medical services less frequently or have reduced access to them because of time and location constraints. Implicit racial bias (nonconscious or automatic behaviors that are not obvious to the biased person) also contributes to the health care disparity. Because black patients often experience this kind of bias, they receive lower-quality care and have less trust in the doctors whom they feel are exhibiting bias: one 2018 working paper from the nonprofit National Bureau of Economic Research showed that black patients have better health outcomes when they have black doctors because of higher levels of trust between the doctors and patients. Without a trusting doctor-patient relationship, black individuals are less likely to request extra care and end up paying for it.

The new paper itself relied on an uneven number of records, with white patients’ documents outnumbering those of black patients by a ratio of about seven to one. But that disparity “isn’t a problem because it reflects the reality of the data that exists,” says Marzyeh Ghassemi, a Canada Research Chair in the departments of computer science and medicine at the University of Toronto. Ghassemi, who was not involved in the Science study, says additional computational models were used to verify the accuracy of the findings. Still, she points out, “lack of diversity in data is a long-standing and pernicious issue.”

Awareness of possible biases—either in the data that are uploaded to the algorithm to learn from or (as in this case) a historically skewed proxy used to understand the data—is important for artificial intelligence developers and users. “I hope that one of the things that come out of this paper is that AI manufacturers and researchers would do a basic set of audits, a parity of predictions, before their product touches a real patient,” Obermeyer says. If data are the problem, collecting more of them can correct bias. And if the issue is in an algorithm’s programmed assumptions, the program needs to be audited.

Obermeyer’s team is currently working with the developer of the algorithm it assessed to address the situation. After all, he says, the point of the study was not to discourage AI in health care. “We can’t manage millions of health care variables using humans alone—we really do need AI to help us manage these problems,” he adds. “That’s the point behind the paper: to point out where the problem lies and how to fix it.”

“I admire that the researchers reached out to the [algorithm] company,” Ghassemi says, “that they engaged with them, which can create a systematic, positive change in a deployed system.” The widespread use of algorithms in health care settings is still very new, and the industry is still learning how to apply them. “This is not a story of good and evil but about improving the way we do things. We can take into account social and cultural biases and make algorithms that work around those biases so we can have the benefits of AI,” Obermeyer says. “It’s worth remembering that humans are biased, too.”