The Harm That Data Do

Paying attention to how algorithmic systems impact marginalized people worldwide is key to a just and equitable future

In Australia, they called it “robo debt”: an automated debt-recovery system that generated fear, anxiety, anger and shame among those who rely on, or who have relied on, social support. In 2016 the country’s Department of Human Services introduced a new way of calculating the annual earnings of welfare beneficiaries and began dispatching automated debt-collection letters to those identified as having been overpaid. The new accounting method meant that fortnightly income could be averaged to estimate the income for an entire year—a problem for those with contract, part-time or precarious work. Reports indicate that the system went from sending out 20,000 debt-collection notices a year to sending up to that many every week.

Previously, when the system identified someone who may have been overpaid benefits, a human was tasked with investigating the case. Under the automated system, however, this step was removed; instead it became the responsibility of the recipients to prove that they had not. That meant finding out why they were targeted—often requiring hours on the phone—and digging up copies of pay slips from as far back as seven years. To make matters worse, many of the debt notices were sent to people already living in difficult situations. Those targeted felt powerless because they had little time or resources to challenge the system. Newspapers reported at least one suicide. A social service organization eventually reported that a quarter of the debt notices it investigated were wrong, and an Australian senate inquiry concluded that “a fundamental lack of procedural fairness” ran through the entire process. In 2019, after years of activism, civil-society mobilization, and political and legal challenges, a judge ruled the system unlawful, and a class action lawsuit was settled in 2020 for $1.2 billion.

We have entered an “age of datafication” as businesses and governments around the world access new kinds of information, link up their data sets, and make greater use of algorithms and artificial intelligence to gain unprecedented insights and make faster and purportedly more efficient decisions. We do not yet know all the implications. The staggering amount of information available about each of us, combined with new computing power, does, however, mean that we become infinitely knowable—while having limited ability to interrogate and challenge how our data are being used.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

At the Data Justice Lab at Cardiff University in Wales, we maintain a Data Harm Record, a running log of problems with automated and algorithmic systems being reported from across the globe. We analyze this record to understand the diverse ways in which such systems are going wrong, how citizens’ groups are dealing with the emerging problems, and how government agencies and legal systems are responding to their challenges. Our studies, we hope, will result in a deeper understanding of how democratic institutions may need to evolve to better protect people—in particular, the marginalized—in the age of big data.

Deepening Inequality

The robo-debt scandal is one of many that demonstrate the power imbalance incorporated into many emerging data systems. To understand what happened, we need answers to such questions as why a system with such high error rates was introduced without adequate due process protections for citizens, why robust impact assessments were not done before it was rolled out, why the needs of those affected were not fully considered in designing the online portal or help line, and why it was deemed permissible to remove human oversight. The problems with it, and with many other data-driven systems, stem in significant part from underlying social and political contexts—specifically, long-standing binaries of “deserving” and “undeserving” citizens that influence how they are valued and treated.

The fact is that some amount of error is inevitable in automated systems, as mathematician Simon Williams of the University of Melbourne in Australia pointed out with regard to the robo-debt case: there always will be false positives and false negatives, and such discrepancies should lead to far greater review, as well as investigation of impact and debate before implementation of such programs.

Research by Joy Buolamwini of the M.I.T. Media Lab has been central to influencing corporate and government bodies to rethink their use of face-recognition technologies. In 2016 Buolamwini founded the Algorithmic Justice League. Her Gender Shades research, conducted with Timnit Gebru, finds that face-recognition technologies routinely demonstrate both skin-type and gender biases. The systems studied most accurately identify white men and show higher error rates for darker skin tones, with the highest error rates for women with darker skin. These error rates are a particular problem given that such systems can influence your ability to travel or access government services or can lead to wrongful arrest. Buolamwini argues that system errors happen, in part, because the machine-learning algorithms are trained on data sets containing mainly white faces. Employees at the high-tech firms that designed these systems are, mostly, white—an imbalance that can limit the ability to spot and address bias.

Likewise, an investigation by news organization ProPublica discovered that algorithms predicting the likelihood that someone charged with a crime would reoffend were twice as likely to falsely rank Black defendants as high risk than white defendants. Similar scoring systems are being used across the U.S. and can influence sentencing, bonds and opportunities to access rehabilitation instead of jail. Because the models are proprietary, it is difficult to know why this happens, but it seems to be connected to weights the algorithms assign to factors such as employment, poverty and family history. Data drawn from a world that is unequal will reflect that inequality and all too often end up reinforcing it.

Disturbingly, researchers find that those at the top—the designers and the administrators—routinely fail to appreciate the limitations of the systems they are introducing. So, for example, the underlying data sets might contain errors, or they could have been compiled from other data sets that are not particularly compatible. Often, too, the implementers are unaware of bureaucratic or infrastructural complexities that can cause problems on the ground. They routinely fail to assess the impact of the new systems on marginalized people or to consult with those who do have the necessary experience and knowledge. When algorithms replace human discretion, they eliminate corrective feedback from those affected, thereby compounding the problem.

At other times, harm results from the way big data are used. Our data “exhaust”—the data we emit as we communicate online, travel and make transactions—can be combined with other data sets to construct intimate profiles about us and to sort and target us. People can be identified by religion, sexual preferences, illnesses, financial vulnerability, and much more. For example, World Privacy Forum’s Pam Dixon found data brokers (the companies that aggregate and sell consumer data) offering a range of problematic lists, such as of individuals suffering from addictive behavior or dementia and rape victims, among others. Researchers studying the financial crash of 2008 found that banks had combined offline and online data to categorize and influence customers. In 2012 the U.S. Department of Justice reached a $175-million settlement with Wells Fargo over allegations that it had systematically pushed Black and Hispanic borrowers into more costly loans.

Overall, the kinds of damage that data systems can cause are incredibly diverse. These may include loss of privacy from data breaches; physical injury as workplace surveillance compels people to take on more than they can; increased insurance and interest rates; and loss of access to basic essentials such as food, home care and health care. In unequal societies, they serve to further embed social and historical discrimination.

Dissent as Necessity

What happens when people try to challenge data harms? To date, we have investigated cases involving governmental use of new data systems in Australia, Canada, the Netherlands, New Zealand, the U.K. and the U.S. Even in these democratic societies, relying on legal systems alone can take years, in the meantime draining precious energy and resources while families are thrown into crisis. Citizens are combining their time and other resources into a collective and multipronged effort that includes all the pillars of democracy.

In the robo-debt case, those affected created a Not My Debt campaign for publishing their stories anonymously, getting help and sharing resources. According to Dan Nicholson, Victoria Legal Aid’s executive director of criminal law, the organization struggled to initiate a federal court challenge, in part because people were reluctant to go public after the Department of Human Services released the private details of one critic to the press. The organization did go on to successfully challenge the system in 2019. One of Nicholson’s biggest concerns is how the government shifted responsibility to individual citizens for proving that no debt is owed, despite its vastly superior ability to compile evidence.

In the Netherlands, individuals and organizations came together to successfully launch a court challenge against the government over Systeem Risico Indicatie (SyRI), which linked citizens’ data to predict who is likely to commit fraud. The litigants argued that the system violates citizens’ rights by treating everyone as guilty until proven innocent. In 2020 a District Court of the Hague ruled that SyRI was in violation of the European Convention on Human Rights. This court case is likely to inspire citizens in other democracies seeking to protect their rights and to expand the definitions of harm.

In the U.K., groups such as defenddigitalme are raising concerns about the psychological and social impact of Web-monitoring software in schools and the ways it can damage students who are wrongly labeled, for instance, as being suicidal or as gang members. In New Zealand, nongovernmental organizations (NGOs) successfully blocked an attempt by the Ministry of Social Development to require all providers of social services to provide data about their clients to receive government funding. The NGOs argued that the requirement could prompt members of already marginalized groups, such as refugees or victims of domestic violence, to avoid help for fear of being identified.

In Little Rock, Ark., an algorithm introduced by the state’s Department of Human Services was blamed for unjustly cutting the home care hours of people with severe disabilities. Earlier, home care nurses determined home care hours. After the change, they helped people fill out a questionnaire and entered the data into a computer system—and the algorithm decided. Government representatives argued that the automated system ensures that assignments of home care hours are fair and objective. Some individuals strongly disagreed, and with the help of Legal Aid of Arkansas, seven of them took the department to court. Six had seen their weekly home care hours cut by more than 30 percent. Court documents make for grim reading, with each plaintiff recounting the impact of the cuts on their life and health.

Examining information about the algorithm extracted via a court order, Legal Aid of Arkansas lawyer Kevin De Liban found numerous problems with it and how it was implemented. In May 2018 a judge ordered the Department of Human Services to stop using it, but the agency refused—whereupon the judge found the department in contempt. The challenge was ultimately successful, with the agency stopping use of the algorithm in 2018.

These cases speak to the importance of collective mobilization in protecting people from injustices committed via data systems. It is difficult for individuals, with relatively limited resources or access to inside information about data systems, to interrogate the systems alone or to seek redress when they are harmed. Apart from instigating collective challenges, broader public discussion is needed about the transparency, accountability and oversight of data systems required for protecting citizens’ rights. Further, how should information about these new systems be communicated so that everyone can understand? What are governments’ obligations to ensure data literacy? And are there no-go areas? Surely maps of where and how governments are introducing data systems and sharing people’s data should, as a first step, be provided as a matter of democratic accountability.

Just as important is ensuring that citizens can meaningfully challenge the systems that affect them. Given that datafied systems will always be error-prone, human feedback becomes essential. Critiques should be welcomed rather than fended off. A fundamental rethink of governance is in order—in particular, on questions of how data systems are part of longer histories of systemic discrimination and violence, of how dissent and collaboration are necessary to democratic functioning, and how both might be better fostered by public bodies and authorities in societies permeated by inequality and data.

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American