May 4, 2018

In a Big Data World, Scholars Need New Guidelines for Research

User information from Facebook and other social-media sites is invaluable to political and social scientists, but it must be treated with care

By Catherine F. Brooks

Join Our Community of Science Lovers!

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

Mark Zuckerberg’s recent testimony to Congress was full of discussion about Facebook’s privacy policies, its advertising-driven business model and the issue of protecting of consumers around the globe. Equally as important, however, but less prominent in the public conversation, are some of the issues around trusting scholars to use people’s personal data from social media sites in an ethical way.

To understand political or social behavior today, scholars need access to private data. But in the case leading up to Zuckerberg’s hearing, a scholar collected data via a “third party application” that he developed, then sold those data to Cambridge Analytica, with unfortunate results. Given the importance of research review processes for institutions and the strict oversight by Institutional Review Boards (IRB) in the United States in particular, the Facebook case brings challenges of doing big data internet research into the spotlight.

Certainly, the Zuckerberg hearings centered around important topics (and provided a lot of congressional theater) but the implications for social science research are now in question. Researchers who have asked for access to large datasets in order to know more about our digital lives are concerned about how tech companies will change their policies. Facebook, for example, has hesitated to share data with social scientists who have questions about political opinions, interpersonal behavior, group networks and digital life.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

We have known for a long time that digital technologies have seriously impacted all kinds of scientific inquiry, but managing the oversight of data collection in a digital world has become much more complex over time. Historically social data, in the form of surveys or transcripts of interviews, were collected with informed consent, then stored on paper in a locked file cabinet.

With the onset of online surveys, datasets were stored instead on password-protected or encrypted computers. Now the emergence of cloud computing is changing data management yet again, and we have to trust a cloud provider to protect the data. It is difficult to see how recent shifts in IRB policy take into full account the magnitude of protection needed to protect research participants of all kinds.

In fact, recent changes to the federal policy in the United States were the first revisions in decades, excluding guidelines for human protections in data science, and actually seem to have relaxed existing standards for protecting research participants. These changes to the federal policy include an expanded list of the kinds of research that are exempt from full IRB review, a broadened reach of consent for secondary data use, and the need to gain IRB approval at each research site as opposed to single-IRB coverage for multisite studies.

IRB boards and research ethics committees need to quickly embrace big data management problems given the scale and speed at which data misuse can harm research participants who have, in many cases, entrusted scholars with their personal data.

To be fair, too, many have written about studying digital behavior and methods for internet research. New initiatives and ideas are emerging for companies like Facebook to share data in a way that protects users and a company’s proprietary information while making anonymized datasets available for experts to analyze.

As Facebook announced, scholars will soon have the ability to interrogate the impact of social media on electoral processes. In a working paper, scholars from Harvard and Stanford suggest a new model for data management for research purposes, one that protects industry interest while allowing for the kinds of scholarly inquiry that are needed to understand social trends, online behavior, and human psychology relative to digital engagements. Ideas like these may or may not be the right paths to take, and internet researchers have already voiced concerns about centralizing the research agenda via a proposed commission or engaging in solid research without knowing a lot about how data are originally collected, but at least there are new proposals emerging.

Social scientists need to access digital data in a safe way that protects consenting research participants. Some IRB boards or other committees overseeing research ethics around the globe have begun the process of envisioning how today’s scholars safely do social media, internet or big data research.

For years there have been calls for educating research review boards about internet research; concerns about scientific contribution when ethical research processes are under fire have also been raised. So research institutions are already late in reviewing their ethical guidelines.

This is urgent, and it is imperative that every researching institution review data protection processes and requirements especially as researchers need access to big datasets to address a wide variety of types of scholarly inquiry. If research institutions around the globe are not nimble in how they provide guidelines for the ethical use of data today and over time, there will be more large Facebook-like data cases that leave ordinary digital citizens vulnerable.

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American