Data science is not entirely new to Washington, D.C.—nor is DJ Patil, who was recently named as the U.S.’s first chief data scientist. Pres. Barack Obama’s administration launched Data.gov nearly six years ago and required all agencies to publish at least three “high-value” data sets to the publicly accessible Web site. Now it is Patil’s job, at least in part, to ensure that the government continues to release data in a variety of areas while ensuring that the information is not misused.
 
Patil’s top priority on returning to Washington after several years as a data specialist at several tech companies and the venture capital firm Greylock Partners is the White House’s Precision Medicine Initiative. Obama launched the public health program in January with a $215 million investment in his 2016 budget to help prevent and treat diseases based on information that takes into account differences in individual patient’s genes, environment and lifestyle. The initiative’s ability to speed the development of new cancer treatments depends not only on scientists contributing their latest research data sets to the project but also on patients volunteering their own personal health information. Patil will play an integral part in determining how researchers, health care institutions and patients can share data without sacrificing privacy.
 
Patil, who was also appointed deputy chief technology officer for data policy in the White House’s Office of Science and Technology Policy, first came to the Beltway area about two decades ago as a University of Maryland assistant research scientist. As a doctoral student and faculty member there, he used open data sets published by the National Oceanic and Atmospheric Administration to help improve numerical weather prediction. Patil also briefly directed social network analysis efforts at the Department of Defense to understand the nature of emerging threats to US interests. Scientific American spoke with Patil about his new gig.
 
[An edited transcript of the interview follows.]
 
What is your mission as the nation’s first chief data scientist?
[Pres. Obama] has been advocating for data science throughout his administration—he’s really the country’s chief data scientist. He was the first president to use analytic dashboards to track [information technology] projects, and he signed an executive order in 2013 that called for making government information open and machine-readable. The Data.gov site [which makes federal, state and local data publicly available] was also launched under this president’s watch.
 
How do your marching orders differ from the data science initiatives that the Obama administration has already launched, such as Data.gov?
Data.gov is one component of this. We see three priorities for us. At the top of my list right now is the Precision Medicine Initiative. Science has enabled us to unlock the human genome. Now we want to combine that with the power of data science, which uses new techniques like machine learning as well as the explosion of data now available about individual patients, whether through their phones or other sensors in their environment. The challenge is putting this together to come up with new ways to think about health care and medical treatments.
 
What is your second priority?
My second priority is opening up more data and making it available for people [both the government and general public] to build an ecosystem of research, mobile apps and visualizations on top of that information. One of the classic examples of building on top of open data is what the National Weather Service does. They create 21 terabytes of data per day and leverage a huge amount of science and technology to make a subset of that data available to the public in a way that’s as easy to access as opening an app on your phone. That massively impacts your life, whether you’re planning your daily activity or checking the status of a flight—the world revolves around that.
 
What rounds out your top-three priorities list?

The third main priority is inserting more data capacity into agencies throughout the government. We’re seeing a rise of data scientists and chief data officers at the National Institutes of Health as well as within [the Department of] Health and Human Services. The Commerce Department announced its first chief data officer [Ian Kalin] last week. We have do decide how to use the best of what we see in data science and statistics groups throughout the government to develop new services.
 
Would those new services be strictly for government use or would they also be available to the general public?
Both. Such services would be valuable to scientists and citizens alike because we’re seeing people take an interest in how a variety of factors impact, for example, their health. People are starting to think about climate data and its impact locally—on their allergies or the threat of Lyme disease in their area—as the climate changes over time. Those are data sets from very different organizations but they deliver very powerful information when brought together. Another superpowerful example of bringing data together to deliver a new service would be in response to disasters. One local governmental department might be able to map out the location of resources, such as fuel sites, that another organization, such as [the Federal Emergency Management Agency], could then combine with data about weather or flood plains to improve its response to some sort of crisis.
 
Given the concerns raised about governmental data collection over the past few years, what are your plans to ensure the government is both protecting and respecting the public’s privacy?
The key word in our mission statement is how we responsibly unlock the power of data for everyone. That means using and making data available ethically and with privacy in mind. [For example] one of the key initiatives that came out of the recently released White House Big Data Report is addressing how we think about student data. How do we make sure that data isn’t just being used for marketing purposes? Another initiative that speaks to responsible use of data relates to last year’s [Federal Trade Commission] report on data brokers and following up on its call for that industry to be more transparent and consumer-friendly. Specific to the Precision Medicine Initiative, the president has stated numerous times that this will be a participant-first project and that participants—whether in academia, industry or government—will be at the table equally when determining how the system will work.
 
Can you elaborate more on what you mean by responsible use of data?
A big part of being responsible is figuring out the right level of transparency for people to know how their data is being utilized. Take Precision Medicine, where we have a voluntary system. People who contribute their data should know what’s being done with it and what it means if they want to take their data out of the system.
 
Which of your achievements in the field of data science are you most proud of at this point?
I’m most proud of the work I did in academia and in government last time around. In academia it was: How do we think about weather forecasting in a new way and show that it’s not as chaotic as people had thought? If there’s a data project that impacts every single person’s life, it’s the weather, sometimes with incredibly dire consequences. The ability to have even a small impact on improving weather forecasting reaches such a large number of people. On an Internet scale, we feel superb if we’re able to reach a million, 10 million, 100 million people if you’re crazy lucky. Working on a weather system means I’ve gotten to impact many billions of people throughout their entire lives.
 
Last time I was in the government we got to start something called the Iraqi Virtual Science Library, which became one of the backbones of the educational system in the collegiate layer across Iraq. It was handed back to the Iraqi government four or five years ago. The opportunity to work on something like that and watch the direct impact it’s had on people’s lives and how it’s helped them construct a life is a level of reward that trumps even the best things I’ve been able to build other places. I feel super fortunate to be able to get a chance to do something like that again. Sometimes people forget that the problems of the greatest scale are in public service and that if you have the technical skills and the opportunity to apply them to a problem in this space, there is nothing more rewarding.