For decades researchers have run longitudinal studies to gain new insights into health and illness. By regularly recording information about the same individuals' medical history and care over many years, they have, for example, shown that lead from peeling paint damages children's brains and bodies and have demonstrated that high blood pressure and cholesterol levels contribute to heart disease and stroke. To this day, some of the original (and now at least 95-year-old) participants in the famous Framingham Heart Study, which began in 1948, still provide health information to study investigators.

Health researchers are not the only ones, however, who collect and analyze medical data over long periods. A growing number of companies specialize in gathering longitudinal information from hundreds of millions of hospitals' and doctors' records, as well as from prescription and insurance claims and laboratory tests. Pooling all these data turns them into a valuable commodity. Other businesses are willing to pay for the insights that they can glean from such collections to guide their investments in the pharmaceutical industry, for example, or more precisely tailor an advertising campaign promoting a new drug.

By law, the identities of everyone found in these commercial databases are supposed to be kept secret. Indeed, the organizations that sell medical information to data-mining companies strip their records of Social Security numbers, names and detailed addresses to protect people's privacy. But the data brokers also add unique numbers to the records they collect that allow them to match disparate pieces of information to the same individual—even if they do not know that person's name. This matching of information makes the overall collection more valuable, but as data-mining technology becomes ubiquitous, it also makes it easier to learn a previously anonymous individual's identity.

At present, the system is so opaque that many doctors, nurses and patients are unaware that the information they record or divulge in an electronic health record or the results from lab tests they request or consent to may be anonymized and sold. But they will not remain in the dark about these practices forever. In researching the medical-data-trading business for an upcoming book, I have found growing unease about the ever expanding sale of our medical information not just among privacy advocates but among health industry insiders as well.

The entire health care system depends on patients trusting that their information will be kept confidential. When they learn that others have insights into what happens between them and their medical providers, they may be less forthcoming in describing their conditions or in seeking help. More and more health care experts believe that it is time to adopt measures that give patients more control over their data.

Multibillion-Dollar Business

The dominant player in the medical-data-trading industry is IMS Health, which recorded $2.6 billion in revenue in 2014. Founded in 1954, the company was taken private in 2010 and relaunched as public in 2014. Since then, it has proved an investor favorite, with shares rising more than 50 percent above its initial price in little more than a year. At press time, IMS was a $9-billion company. Competitors include Symphony Health Solutions and smaller rivals in various countries.

Decades ago, before computers came into widespread use, IMS field agents photographed thousands of prescription records at pharmacies for hundreds of clerks to transcribe—a slow and costly process. Nowadays IMS automatically receives petabytes (1015 bytes or more) of data from the computerized records held by pharmacies, insurance companies and other medical organizations—including federal and many state health departments. Three quarters of all retail pharmacies in the U.S. send some portion of their electronic records to IMS. All told, the company says it has assembled half a billion dossiers on individual patients from the U.S. to Australia.

IMS and other data brokers are not restricted by medical privacy rules in the U.S., because their records are designed to be anonymous—containing only year of birth, gender, partial zip code and doctor's name. The Health Insurance Portability and Accountability Act (HIPAA) of 1996, for instance, governs only the transfer of medical information that is tied directly to an individual's identity.

Even anonymized, the data command premium prices. Every year, for example, Pfizer spends $12 million to buy health data from a variety of sources, including IMS, according to Marc Berger, who oversees the analysis of anonymized patient data at Pfizer. But companies engaged in the data trade tend to keep the practice below the general public's radar.

Case in point: In the 1990s IMS started selling data on what individual U.S. physicians prescribe to patients to help drug companies tailor sales pitches to specific care providers. (HIPAA protects the identity of patients, not health care workers.) For years doctors did not realize that outsiders had insights on their prescribing habits. “At the time, it was taboo. It was forbidden to ever mention that topic,” says Shahram Ahari, who used such data as a pharmaceutical representative visiting doctors for Eli Lilly from 1999 to 2000 and is now completing a residency at the University of Rochester. “It was the big secret.” Asked for a response, an Eli Lilly spokesperson replied in an e-mail, “We have always been up front that we receive data from IMS.”

Eventually physicians caught on and complained. Some considered such data gathering a privacy invasion; others objected to commercial firms profiting from details about their practices. A few states passed laws banning the collection of physician-prescribing habits. IMS challenged those rules all the way to the U.S. Supreme Court and—despite the arguments of 36 states, the Department of Justice, and numerous medical and consumer-advocacy groups supporting data limits—won its case in 2011 on corporate “free speech” grounds. The practice continues to this day, much of the time beyond public notice.

What Could Go Wrong?

Once upon a time, simply removing a person's name, address and Social Security number from a medical record may well have protected anonymity. Not so today. Straightforward data-mining tools can rummage through multiple databases containing anonymized and nonanonymized data to reidentify the individuals from their ostensibly private medical records.

Indeed, computer scientists have repeatedly shown how easy it can be to crack seemingly anonymous data sets. For example, Harvard University professor Latanya Sweeney used such methods when she was a graduate student at the Massachusetts Institute of Technology in 1997 to identify then Massachusetts governor William Weld in publicly available hospital records. All she had to do was compare the supposedly anonymous hospital data about state employees to voter registration rolls for the city of Cambridge, where she knew the governor lived. Soon she was able to zero in on certain records based on age and gender that could have only belonged to Weld and that detailed a recent visit he made to a hospital, including his diagnosis and the prescriptions he took home with him.

“It is getting easier and easier to identify people from anonymized data,” says Chesley Richards, director of the Office of Public Health Scientific Services at the Centers for Disease Control and Prevention. “You may not be identifiable from a particular data set that an entity has collected, but if you are a broker that is assembling a number of sets and looking for ways to link those data, that's where, potentially, the risk becomes greater for identification.”

IMS officials say they have no interest in identifying patients and take careful steps to preserve anonymity. Moreover, there are no publicly recorded instances of someone taking anonymized patient data from IMS or a rival company and reidentifying individuals. Yet IMS does not want to talk too much about the gathering and selling of longitudinal data. At IMS, the CEO, the head of its Institute for Healthcare Informatics, the vice president of industry relations and the chief privacy officer declined to be interviewed for this article, but a company spokesperson did assist with fact-checking.

Where to Draw the Line?

Apart from making money selling information to other businesses, IMS also shares some data with academic and other researchers for free or at a discount. The company has published a long list of medical articles that relied on its longitudinal data. For example, researchers learned that newer cardiovascular drugs reduce the length of hospital stays but do not prolong lives. In contrast, newer chemotherapy drugs are probably responsible for some of the recent decline in death rates from cancer in France.

Such benefits demonstrate that amassing medical data from multiple sources can have societal benefits. There is, however, a difference, says Jerry Avorn, a professor of medicine at Harvard Medical School, between “conscious, responsible researchers who only want to learn about medications' good and bad effects in a university medical school setting versus somebody sitting in the backroom [of a superstore] trying to figure out how can they sell more of product X by invading someone's privacy.”

One small step toward reestablishing trust in the confidentiality of medical information is to give individuals the chance to forbid collection of their information for commercial use—an option the Framingham study now offers its participants, as does the state of Rhode Island in its sharing of anonymized insurance claims. “I personally believe that at the end of the day, individuals own their data,” says Pfizer's Berger. “If somebody is using [their] data, they should know.” And if the collection is “only for commercial purposes, I think patients should have the ability to opt out.”

Seeking more detailed consent cannot, by itself, stem the erosion of patient privacy, but it will raise awareness—without which no further action is possible. Trust in the medical system is too vital to be sacrificed to uncontrolled market forces.

This reporting project was funded by a Reporting Award at New York University's Arthur L. Carter Journalism Institute.