One of the enduring mysteries of medicine is how individual genes, environment and lifestyle may combine to spark sickness or protect us from it. Unraveling this puzzle remains essential for scientists hoping to achieve the elusive goal of offering tailored treatments or personalized prevention plans.

That’s why Pres. Barack Obama in 2015 announced an ambitious plan to roll out a precision medicine initiative that would aim to enroll a diverse group of one million people. Participants would volunteer, either via their doctors or by signing up online, to submit their medical records to the National Institutes of Health. They would also fill out online surveys about their lifestyles, furnish blood and urine samples, and have their genomes sequenced. Later they might also offer other biological data or even wear health trackers that may not yet exist. Researchers—and members of the public—could apply for access to the anonymized patient data and track individuals’ health outcomes, hopefully gleaning insights about how our individual differences affect health and disease risk.

Three years after unveiling that audacious effort—rebranded as “All of Us” about a year ago—it is only now officially getting off the ground. No genomes have yet been sequenced. Instead, federal workers and clinic partners have enrolled “beta testers” in a pilot phase of the project. About 26,000 volunteers have provided blood or urine samples, and filed out surveys about their health care.

The official enrollment phase finally begins this week. After the program opens its metaphorical doors on May 6, any U.S. resident over age 18 can sign up either through the JoinAllOfUs Web site or participating health care provider organizations. Children will be allowed to join in the coming years, but the details of their participation are still being ironed out. The massive All of Us venture, one of several public and private projects that seek to harness the power of big data, also promises participants access to their own information and to summarized data from across the program. The NIH has additionally said it plans to notify participants if they have certain genetic variants linked to specific health problems. All of Us comes with a big price tag: The 21st-Century Cures Act, passed in December 2016, authorized a total of $1.5 billion over 10 years for the program—but these funds are subject to the annual congressional appropriations process, so exact funding levels may vary—or could be supplemented by other government allocations.

The program director for All of Us, Eric Dishman, previously worked in business and health sciences at Intel and has presented a TED Talk about his own personal experience with cancer and genome sequencing. Scientific American recently spoke with him about some of the controversies, obstacles and opportunities swirling around All of Us.

[An edited transcript of the interview follows.]

Last week news broke that genetic data from a distant relative of the Golden State Killer, found on the DNA and genealogy Web site GEDmatch, was used to help identify him. Naturally, people may be concerned about how their All of Us genome-sequencing information could be accessed by authorities in the future—particularly since this effort is being managed by the federal government. What is your response to such concerns about privacy and law enforcement access?

I just sent a note to all of our principle investigators and clinic sites saying you may get questions about this. Participants are protected by federal law from us sharing this data with other federal agencies [such as law enforcement].

All of our partners are covered under the same certificate of confidentiality that covers all federal research. It says you cannot give this data to other federal agencies. Moreover, even if somehow it was stolen or something like that, that data—because it was given as part of a research study—is inadmissible in court, and participants are immune from any consequences it could try to be used for.

All of our partners, by law, must protect that data—even if criminal proceedings are happening and they are subpoenaed.

Tell me about your beta testers. Who are they, what information did they provide, and have you encountered any hiccups with them?

We wanted the beta experience to be as close to the national experience as possible. The only difference in this stage was you had to have a code to join, and those were distributed by more than 120 participating clinics around the country. Our original goal was to get 10,000 to 15,000 people to go through our system using our real protocol—which is basically consenting to be part of a longitudinal study where you consent to give us your electronic health records, three initial surveys on overall health and a bit about family history. And some people were invited to also do physical measurements like height, weight, blood pressure and hip circumference, and give biospecimens of blood and urine at a clinic. Then those specimens were shipped within 24 hours to Mayo Clinic for storage. That’s the “get started” protocol. As of today there are around 44,000 people who have started somewhere in the process. They may have started the surveys and are waiting to do what’s next. But 26,000 people have completed that whole process.

After the May 6 kickoff date, can enrolled patients change their minds and have their data removed?

Yes. There is a withdrawal procedure. At any point you can withdraw and we would stop pulling your data from your health record. But obviously if someone has used your data as part of a summary statistic in a study, or run analysis on it and it’s already in a publication, we can’t stop that. But it’s already been “de-identified” at that stage and not tied to individuals.

So to be clear, if you pull out, do your biosamples and genome-sequencing data get thrown out?

Everything from their surveys to their samples can’t be used anymore going forward. They do destroy it all. They don’t throw it out back—we wouldn’t leave the samples out in the trash or anything like that—but yes, it is destroyed. Moreover, by the time a sample gets to a biobank it has no information in it that would tie it to an individual unless it was sequenced.

One of the program’s objectives is to get strong representation from underrepresented groups, some of whom have had historically fraught relationships with researchers. How do you plan to do that?

The good news is that we really wanted to get diversity out of this beta stage since it is one of the fundamental goals of the program, and as of [April 26] our numbers continue to be at a range of 70 to 75 percent of the population being underrepresented in biomedical research. We are specifically focused on race, ethnicity, income and age. Also, we’re focused on sexual and gender minorities, educational attainment and geography—for example, rural areas haven’t really been included in biomedical research. We are also looking at access to care and disability.

All our clinic sites have a local participant advisory board and local community advisors that have trust in the community. For many of those underrepresented communities the only familiarity they have are negative cases like the Tuskegee Syphilis Experiment—so a lot of people have asked me, have we taken some big numerical goal for our launch on May 6, and the answer is no. The goal is to start to teach people what precision medicine is, and what the All of Us program is.

I’ve written before about the VA’s veteran genome project and its related goals of getting complete health information and DNA analysis from one million volunteers. Why is it important to have a separate National Institutes of Health project rather than incorporate data from that work?

The more cohorts the better! It’s going to take cooperation and collaboration amongst these large cohort programs to get the science, and hopefully the cures, for rare and very rare conditions. The Million Veteran Program is a partner of ours. In fact they are one of our health provider organizations; in the beta phase they already started recruiting some people into All of Us. Large health providers like Geisinger and Kaiser Permanente also gave us good advice.

I’m not in a competition, but I do think there are some differentiating aspects of our program, too: The first is the diversity focus; no one else is trying to do this. That scientific mission of achieving better coverage of underrepresented biomedical research is unique in our program. The diversity of data types we are trying to collect and do over time is different, too. This is a life-stage study of trying to understand the complexities of what causes one person to be ill and the other not by collecting genetic, behavioral, geographic and environmental data types. The other thing is treating participants as partners, and asking them what data they want back and then giving it back. This data is also going to be open to the public and researchers at the same time—there are different layers of data access based on layers of data de-identification, but this is different than other cohort programs out there.

What genetic data will you give back to participants?

As the whole genome-sequencing and genotyping work starts later this year into next year, we are going to start our “responsible return of information” pilot. Part of that pilot is, how do we get the appropriate educational and counseling resources in place for that group of people? We already know that we will look for the ACMG 59—the clinically accepted list of medically actionable genes recommended for return in clinical genomic sequencing. We want to make sure there is counseling available to them to receive that information—and then general education and understanding of what that data means, and how it’s different then information you might get from an ancestry service you paid for last Christmas.

What do you see as the biggest obstacles to getting this program off the ground?

I think it is that education and expectations management. I had cancer for 23 years, and with a combination of having all my electronic health care data pulled together and my genome sequencing at the 11th hour, it saved my life—which I talk about on my TED Talk. But we know most cancers don’t have a precision cure yet, and we want to talk about the promise of this area without overselling it.

While there will be some early studies that just use the electronic health care records and the surveys, it will take a while before there are any evidence-based studies and breakthroughs from All of Us. So how do you maintain that trust and importance of volunteering, but manage expectations to make clear that this won’t be about overnight miracles?