Four years before Pres. Barack Obama unveiled plans for a $215-million Precision Medicine Initiative designed to better understand genetic variations within disease and develop treatments, veterans were already volunteering to be part of an avant-garde effort to boost such tailor-made medicine. The venture, called the Million Veteran Program (MVP), aimed to get complete health information and DNA analysis from one million volunteers receiving health services through the Veterans Health Administration (commonly called the VA). Now, as other research groups try to scale up their own efforts for the president’s initiative, the VA effort is one of the lone guideposts in a field with few landmarks.

The fledgling VA project now boasts almost 400,000 blood samples matched to electronic health care records and specially designed questionnaires. The blood samples are each analyzed for more than 700,000 single nucleotide polymorphisms, or SNPs—common genetic variations that could be associated with various diseases depending on their location or effect on gene function. Thousands of those blood samples have also been more fully sequenced for specific research projects that require scientists to get a more in-depth picture of volunteers’ genetic makeup. And to prepare those blood samples for sequencing, researchers must first isolate white blood cells from the blood and extract the DNA from them.

That is just step one. The combination of complete medical records, genetic information and detailed demographic questionnaires could be the recipe to start unraveling questions about schizophrenia, post-traumatic stress disorder and other ailments including cardiovascular disease. At least that’s the VA’s hope. “This is a new brand of science and we really are inventing the methodology as we go on,” says Michael Gaziano, one of the two principal investigators leading the MVP. Other similar U.S. biobanks—including those run by Vanderbilt University and Kaiser Permanente Northern California—have not yet reached the scale of the VA project. So for now researchers aiming to be part of the Precision Medicine Initiative are eying the VA effort as one of the few available models.

The massive VA project—which links the genetic data to clinical, lifestyle and environmental information—will “inform Pres. Obama’s broader Precision Medicine Initiative through the insights it promises to provide” and “help guide design and implementation of the PMI’s planned million-person cohort,” Jo Handelsman, associate director for science at the White House’s Office of Science and Technology Policy, told Scientific American, in a statement.

But the VA is doing more than just collecting information and blood samples from lots of new patients. Some of its specialized research projects include an ongoing genetic analysis that compares more than 9,000 participants who have received a diagnosis of schizophrenia or bipolar disorder with individuals without the disorders, says John Concato, the other lead MVP investigator. And this month the Department of Veterans Affairs announced four more research projects that will draw on the MVP data. They will focus on the genetic contributions of heart disease, kidney disease and substance abuse. Those efforts, according to the White House, will also help inform plans for how the Precision Medicine Initiative should be generally mapped out, including “the types of data that should be included and the design of the data platform.”

Those are formidable hurdles. Genome sequencing data takes up a massive amount of computer space. Characterizing all three billion base pairs (the A, C, T and G letters on the DNA ladder) that make up the human genome takes up far more computer memory than large song or movie files. Even storing the information from the smaller subset of the genome called the exome—that contains the 20,000 or so genes that provide instructions for making proteins—is a massive undertaking. “If one printed out the whole genome of one person it would take 660,000 pages if someone used single space 10-point characters," says Kirk Wilhelmsen, a geneticist and chief domain scientist for biology at the Renaissance Computing Institute at the University of North Carolina at Chapel Hill. A finished whole genome sequence could take up the equivalent of five CD’s—just for one person, he says. Even sequencing a person’s exome would take up about 1 to 2 percent of that space. Yet the VA has approximately 28,000 exome sequences and 2,000 whole genome sequences.

But space is not the only obstacle for a project like this. Genome-sequencing data also could be particularly attractive to hackers. The VA took specific steps to protect the data. “We designed this system to maximize patient confidentiality. One way we achieve that is the tube with the sample in it only has a bar code associated with it,” says Timothy O’Leary, the VA’s chief research and development officer. “We did this to reduce the chance of loss of anonymity.”

The VA genome project has already spooled across the country as patients volunteered at some 50 sites, although certain parts of the project are centralized. A massive computer-processing center supporting the data sharing, for one, is located in Pittsburgh. Then, of course, there’s the blood.

Immersed in a two-story, liquid nitrogen–cooled freezer bank in Boston are almost 400,000 tubes of veterans’ blood. The samples are kept at –80 degrees Celsius. When they are needed, a robotic arm lifts them from their icy berth. They do not suffer from some of the limitations of tightly focused demographic sampling that plague so many medical research projects in the U.S.: They have significant numbers of underserved minority populations including African-Americans, Hispanics and Native Americans. (Approximately 8 percent of MVP samples are from females, consistent with the proportion of female veterans overall, according to Concato.) The samples also include “thousands of people we consider exceptionally aged males,” Gaziano says. Roughly 2,000 participants are 90 years or older and over 200 are 95 years or older.

This is not the first effort to gather medical and biological samples from service members but it will provide information different from any other. For example, the Department of Defense Serum Repository in Maryland already houses more than 50 million samples of blood serum—a yellowish liquid chock full of antibodies and proteins—from 10 million individuals. But sera are not ideal for genomic analysis because they hold little usable DNA. Instead, gene-sequencing work usually hinges on isolating DNA from white blood cells (the standard with MVP).

The repository, which originally started collecting the samples as part of an HIV/AIDS program, remains mired in controversy. Many service members (and family members receiving care through the VA system) did not realize their samples would be kept in perpetuity. Some have even asked for them back—without success. In contrast, the VA actively recruits individuals to MVP through mailed letters asking them to participate or recruits them when they are receiving care at a VA physician’s office. Consenting volunteers fill out an in-depth questionnaire, which asks medical and demographic questions that may not be included in their official health record. They also donate the equivalent of about two tablespoons of blood.

Because veterans access their health care via an integrated system that has long relied on electronic health care records, which follow patients from location to location, the MVP researchers had a built-in advantage when they tried to gather information on volunteering patients. Other research projects may not be so lucky.