February 13, 2013

33 min read

Trial sans Error: How Pharma-Funded Research Cherry-Picks Positive Results [Excerpt]

Clinical trial data on new drugs is systematically withheld from doctors and patients, bringing into question many of the premises of the pharmaceutical industry—and the medicine we use

By Ben Goldacre

Excerpt from Bad Pharma: How Drug Companies Mislead Doctors and Harm Patients, by Ben Goldacre. Published by Faber and Faber, Inc. © 2013 Ben Goldacre. Excerpted with permission from the publisher. All Rights Reserved.

Before we get going, we need to establish one thing beyond any doubt: industry-funded trials are more likely to produce a positive, flattering result than independently funded trials. This is our core premise, and you’re about to read a very short chapter, because this is one of the most well-documented phenomena in the growing field of “research about research”. It has also become much easier to study in recent years, because the rules on declaring industry funding have become a little clearer.

We can begin with some recent work: in 2010, three researchers from Harvard and Toronto found all the trials looking at five major classes of drug—antidepressants, ulcer drugs and so on—then measured two key features: were they positive, and were they funded by industry? They found over five hundred trials in total: 85 per cent of the industry-funded studies were positive, but only 50 per cent of the government funded trials were. That’s a very significant difference.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

In 2007, researchers looked at every published trial that set out to explore the benefit of a statin. These are cholesterol lowering drugs which reduce your risk of having a heart attack, they are prescribed in very large quantities, and they will loom large in this book. This study found 192 trials in total, either comparing one statin against another, or comparing a statin against a different kind of treatment. Once the researchers controlled for other factors (we’ll delve into what this means later), they found that industry-funded trials were twenty times more likely to give results favoring the test drug. Again, that’s a very big difference.

We’ll do one more. In 2006, researchers looked into every trial of psychiatric drugs in four academic journals over a ten-year period, finding 542 trial outcomes in total. Industry sponsors got favorable outcomes for their own drug 78 per cent of the time, while independently funded trials only gave a positive result in 48 per cent of cases. If you were a competing drug put up against the sponsor’s drug in a trial, you were in for a pretty rough ride: you would only win a measly 28 per cent of the time.

These are dismal, frightening results, but they come from individual studies. When there has been lots of research in a field, it’s always possible that someone—like me, for example—could cherry-pick the results, and give a partial view. I could, in essence, be doing exactly what I accuse the pharmaceutical industry of doing, and only telling you about the studies that support my case, while hiding the reassuring ones from you.

To guard against this risk, researchers invented the systematic review. We’ll explore this in more detail soon, since it’s at the core of modern medicine, but in essence a systematic review is simple: instead of just mooching through the research literature, consciously or unconsciously picking out papers here and there that support your pre-existing beliefs, you take a scientific, systematic approach to the very process of looking for scientific evidence, ensuring that your evidence is as complete and representative as possible of all the research that has ever been done.

Systematic reviews are very, very onerous. In 2003, by coincidence, two were published, both looking specifically at the question we’re interested in. They took all the studies ever published that looked at whether industry funding is associated with pro-industry results. Each took a slightly different approach to finding research papers, and both found that industry-funded trials were, overall, about four times more likely to report positive results. A further review in 2007 looked at the new studies that had been published in the four years after these two earlier reviews: it found twenty more pieces of work, and all but two showed that industry sponsored trials were more likely to report flattering results.

I am setting out this evidence at length because I want to be absolutely clear that there is no doubt on the issue. Industry sponsored trials give favorable results, and that is not my opinion, or a hunch from the occasional passing study. This is a very well-documented problem, and it has been researched extensively, without anybody stepping out to take effective action, as we shall see.

There is one last study I’d like to tell you about. It turns out that this pattern of industry-funded trials being vastly more likely to give positive results persists even when you move away from published academic papers, and look instead at trial reports from academic conferences, where data often appears for the first time (in fact, as we shall see, sometimes trial results only appear at an academic conference, with very little information on how the study was conducted).

Fries and Krishnan studied all the research abstracts presented at the 2001 American College of Rheumatology meetings which reported any kind of trial, and acknowledged industry sponsorship, in order to find out what proportion had results that favored the sponsor’s drug. There is a small punch-line coming, and to understand it we need to cover a little of what an academic paper looks like. In general, the results section is extensive: the raw numbers are given for each outcome, and for each possible causal factor, but not just as raw figures. The ‘ranges’ are given, subgroups are perhaps explored, statistical tests are conducted, and each detail of the result is described in table form, and in shorter narrative form in the text, explaining the most important results. This lengthy process is usually spread over several pages.

In Fries and Krishnan [2004] this level of detail was unnecessary. The results section is a single, simple, and—I like to imagine—fairly passive-aggressive sentence:

The results from every RCT (45 out of 45) favored the drug of the sponsor.

This extreme finding has a very interesting side effect, for those interested in time-saving shortcuts. Since every industry sponsored trial had a positive result, that’s all you’d need to know about a piece of work to predict its outcome: if it was funded by industry, you could know with absolute certainty that the trial found the drug was great.

How does this happen? How do industry-sponsored trials almost always manage to get a positive result? It is, as far as anyone can be certain, a combination of factors. Sometimes trials are flawed by design. You can compare your new drug with something you know to be rubbish—an existing drug at an inadequate dose, perhaps, or a placebo sugar pill that does almost nothing. You can choose your patients very carefully, so they are more likely to get better on your treatment. You can peek at the results halfway through, and stop your trial early if they look good (which is—for interesting reasons we shall discuss—statistical poison). And so on.

But before we get to these fascinating methodological twists and quirks, these nudges and bumps that stop a trial from being a fair test of whether a treatment works or not, there is something very much simpler at hand.

Sometimes drug companies conduct lots of trials, and when they see that the results are unflattering, they simply fail to publish them. This is not a new problem, and it’s not limited to medicine. In fact, this issue of negative results that go missing in action cuts into almost every corner of science. It distorts findings in fields as diverse as brain imaging and economics, it makes a mockery of all our efforts to exclude bias from our studies, and despite everything that regulators, drug companies and even some academics will tell you, it is a problem that has been left unfixed for decades.

In fact, it is so deep-rooted that even if we fixed it today—right now, for good, forever, without any flaws or loopholes in our legislation—that still wouldn’t help, because we would still be practicing medicine, cheerfully making decisions about which treatment is best, on the basis of decades of medical evidence which is—as you’ve now seen—fundamentally distorted.

But there is a way ahead.

Why missing data matters
Reboxetine is a drug I myself have prescribed. Other drugs had done nothing for this particular patient, so we wanted to try something new. I’d read the trial data before I wrote the prescription, and found only well-designed, fair tests, with overwhelmingly positive results. Reboxetine was better than placebo, and as good as any other antidepressant in head-to-head comparisons. It’s approved for use by the Medicines and Healthcare products Regulatory Agency (the MHRA) in the UK, but wisely, the U.S. chose not to approve it. (This is no proof of the FDA being any smarter; there are plenty of drugs available in the U.S. that the UK never approved.) Reboxetine was clearly a safe and effective treatment. The patient and I discussed the evidence briefly, and agreed it was the right treatment to try next. I signed a prescription saying I wanted my patient to have this drug.

But we had both been misled. In October 2010 a group of researchers were finally able to bring together all the trials that had ever been conducted on reboxetine.6 Through a long process of investigation—searching in academic journals, but also arduously requesting data from the manufacturers and gathering documents from regulators—they were able to assemble all the data, both from trials that were published, and from those that had never appeared in academic papers.

When all this trial data was put together it produced a shocking picture. Seven trials had been conducted comparing reboxetine against placebo. Only one, conducted in 254 patients, had a neat, positive result, and that one was published in an academic journal, for doctors and researchers to read. But six more trials were conducted, in almost ten times as many patients. All of them showed that reboxetine was no better than a dummy sugar pill. None of these trials was published. I had no idea they existed.

It got worse. The trials comparing reboxetine against other drugs showed exactly the same picture: three small studies, 507 patients in total, showed that reboxetine was just as good as any other drug. They were all published. But 1,657 patients’ worth of data was left unpublished, and this unpublished data showed that patients on reboxetine did worse than those on other drugs. If all this wasn’t bad enough, there was also the side effects data. The drug looked fine in the trials which appeared in the academic literature: but when we saw the unpublished studies, it turned out that patients were more likely to have side effects, more likely to drop out of taking the drug, and more likely to withdraw from the trial because of side effects, if they were taking reboxetine rather than one of its competitors.

If you’re ever in any doubt about whether the stories in this book make me angry—and I promise you, whatever happens, I will keep to the data, and strive to give a fair picture of everything we know—you need only look at this story. I did everything a doctor is supposed to do. I read all the papers, I critically appraised them, I understood them, I discussed them with the patient, and we made a decision together, based on the evidence. In the published data, reboxetine was a safe and effective drug. In reality, it was no better than a sugar pill, and worse, it does more harm than good. As a doctor I did something which, on the balance of all the evidence, harmed my patient, simply because unflattering data was left unpublished.

If you find that amazing, or outrageous, your journey is just beginning. Because nobody broke any law in that situation, reboxetine is still on the market, and the system that allowed all this to happen is still in play, for all drugs, in all countries in the world. Negative data goes missing, for all treatments, in all areas of science. The regulators and professional bodies we would reasonably expect to stamp out such practices have failed us.

In a few pages, we will walk through the literature that demonstrates all of this beyond any doubt, showing that “publication bias”—the process whereby negative results go unpublished—is endemic throughout the whole of medicine and academia; and that regulators have failed to do anything about it, despite decades of data showing the size of the problem. But before we get to that research, I need you to feel its implications, so we need to think about why missing data matters.

Evidence is the only way we can possibly know if something works—or doesn’t work—in medicine. We proceed by testing things, as cautiously as we can, in head-to-head trials, and gathering together all of the evidence. This last step is crucial: if I withhold half the data from you, it’s very easy for me to convince you of something that isn’t true. If I toss a coin a hundred times, for example, but only tell you about the results when it lands heads-up, I can convince you that this is a two-headed coin. But that doesn’t mean I really do have a two-headed coin: it means I’m misleading you, and you’re a fool for letting me get away with it. This is exactly the situation we tolerate in medicine, and always have. Researchers are free to do as many trials as they wish, and then choose which ones to publish.

The repercussions of this go way beyond simply misleading doctors about the benefits and harms of interventions for patients, and way beyond trials. Medical research isn’t an abstract academic pursuit: it’s about people, so every time we fail to publish a piece of research we expose real, living people to unnecessary, avoidable suffering.

TGN1412
In March 2006, six volunteers arrived at a London hospital to take place in a trial. It was the first time a new drug called TGN1412 had ever been given to humans, and they were paid £2,000 each. Within an hour these six men developed headaches, muscle aches, and a feeling of unease. Then things got worse: high temperatures, restlessness, periods of forgetting who and where they were. Soon they were shivering, flushed, their pulses racing, their blood pressure falling. Then, a cliff: one went into respiratory failure, the oxygen levels in his blood falling rapidly as his lungs filled with fluid. Nobody knew why. Another dropped his blood pressure to just 65/40, stopped breathing properly, and was rushed to an intensive care unit, knocked out, intubated, mechanically ventilated. Within a day all six were disastrously unwell: fluid on their lungs, struggling to breathe, their kidneys failing, their blood clotting uncontrollably throughout their bodies, and their white blood cells disappearing. Doctors threw everything they could at them: steroids, antihistamines, immune-system receptor blockers. All six were ventilated on intensive care. They stopped producing urine; they were all put on dialysis; their blood was replaced, first slowly, then rapidly; they needed plasma, red cells, platelets. The fevers continued. One developed pneumonia. And then the blood stopped getting to their peripheries. Their fingers and toes went flushed, then brown, then black, and then began to rot and die. With heroic effort, all escaped, at least, with their lives.

The Department of Health convened an Expert Scientific Group to try to understand what had happened, and from this two concerns were raised. First: can we stop things like this from happening again? It’s plainly foolish, for example, to give a new experimental treatment to all six participants in a ‘first-in-man’ trial at the same time, if that treatment is a completely unknown quantity. New drugs should be given to participants in a staggered process, slowly, over a day. This idea received considerable attention from regulators and the media.

Less noted was a second concern: could we have foreseen this disaster? TGN1412 is a molecule that attaches to a receptor called CD28 on the white blood cells of the immune system. It was a new and experimental treatment, and it interfered with the immune system in ways that are poorly understood, and hard to model in animals (unlike, say, blood pressure, because immune systems are very variable between different species). But as the final report found, there was experience with a similar intervention: it had simply not been published. One researcher presented the inquiry with unpublished data on a study he had conducted in a single human subject a full ten years earlier, using an antibody that attached to the CD3, CD2 and CD28 receptors. The effects of this antibody had parallels with those of TGN1412, and the subject on whom it was tested had become unwell. But nobody could possibly have known that, because these results were never shared with the scientific community. They sat unpublished, unknown, when they could have helped save six men from a terrifying, destructive, avoidable ordeal.

That original researcher could not foresee the specific harm he contributed to, and it’s hard to blame him as an individual, because he operated in an academic culture where leaving data unpublished was regarded as completely normal. The same culture exists today. The final report on TGN1412 concluded that sharing the results of all first-in-man studies was essential: they should be published, every last one, as a matter of routine. But phase 1 trial results weren’t published then, and they’re still not published now. In 2009, for the first time, a study was published looking specifically at how many of these first-in-man trials get published, and how many remain hidden. They took all such trials approved by one ethics committee over a year. After four years, nine out of ten remained unpublished; after eight years, four out of five were still unpublished.

In medicine, as we shall see time and again, research is not abstract: it relates directly to life, death, suffering and pain. With every one of these unpublished studies we are potentially exposed, quite unnecessarily, to another TGN1412. Even a huge international news story, with horrific images of young men brandishing blackened feet and hands from hospital beds, wasn’t enough to get movement, because the issue of missing data is too complicated to fit in one sentence.

When we don’t share the results of basic research, such as a small first-in-man study, we expose people to unnecessary risks in the future. Was this an extreme case? Is the problem limited to early, experimental, new drugs, in small groups of trial participants?

In the 1980s, U.S. doctors began giving anti-arrhythmic drugs to all patients who’d had a heart attack. This practice made perfect sense on paper: we knew that anti-arrhythmic drugs helped prevent abnormal heart rhythms; we also knew that people who’ve had a heart attack are quite likely to have abnormal heart rhythms; we also knew that often these went unnoticed, undiagnosed and untreated. Giving anti-arrhythmic drugs to everyone who’d had a heart attack was a simple, sensible preventive measure.

Unfortunately, it turned out that we were wrong. This prescribing practice, with the best of intentions, on the best of principles, actually killed people. And because heart attacks are very common, it killed them in very large numbers: well over 100,000 people died unnecessarily before it was realized that the fine balance between benefit and risk was completely different for patients without a proven abnormal heart rhythm.

Could anyone have predicted this? Sadly, yes, they could have. A trial in 1980 tested a new anti-arrhythmic drug, lorcainide, in a small number of men who’d had a heart attack—less than a hundred—to see if it was any use. Nine out of forty-eight men on lorcainide died, compared with one out of forty-seven on placebo. The drug was early in its development cycle, and not long after this study it was dropped for commercial reasons. Because it wasn’t on the market, nobody even thought to publish the trial. The researchers assumed it was an idiosyncrasy of their molecule, and gave it no further thought. If they had published, we would have been much more cautious about trying other anti-arrhythmic drugs on people with heart attacks, and the phenomenal death toll—over 100,000 people in their graves prematurely—might have been stopped sooner. More than a decade later, the researchers finally did publish their results, with a mea culpa, recognizing the harm they had done by not sharing them earlier:

When we carried out our study in 1980, we thought that the increased death rate that occurred in the lorcainide group was an effect of chance. The development of lorcainide was abandoned for commercial reasons, and this study was therefore never published; it is now a good example of ‘publication bias’. The results described here might have provided an early warning of trouble ahead.10

As we shall shortly see, this problem of unpublished data is widespread throughout medicine, and indeed the whole of academia, even though the scale of the problem, and the harm it causes, have been documented beyond any doubt. We will see stories on basic cancer research, Tamiflu, cholesterol blockbusters, obesity drugs, antidepressants and more, with evidence that goes from the dawn of medicine to the present day, and data that is still being withheld, right now, as I write, on widely used drugs which many of you reading this book will have taken this morning. We will also see how regulators and academic bodies have repeatedly failed to address the problem.

Because researchers are free to bury any result they please, patients are exposed to harm on a staggering scale throughout the whole of medicine, from research to practice. Doctors can have no idea about the true effects of the treatments they give. Does this drug really work best, or have I simply been deprived of half the data? Nobody can tell. Is this expensive drug worth the money, or have the data simply been massaged? No one can tell. Will this drug kill patients? Is there any evidence that it’s dangerous? No one can tell.

This is a bizarre situation to arise in medicine, a discipline where everything is supposed to be based on evidence, and where everyday practice is bound up in medico-legal anxiety. In one of the most regulated corners of human conduct we’ve taken our eyes off the ball, and allowed the evidence driving practice to be polluted and distorted. It seems unimaginable. We will now see how deep this problem goes.

Why we summarize data
Missing data has been studied extensively in medicine. But before I lay out that evidence, we need to understand exactly why it matters, from a scientific perspective. And for that we need to understand systematic reviews and “meta-analysis.” Between them, these are two of the most powerful ideas in modern medicine. They are incredibly simple, but they were invented shockingly late.

When we want to find out if something works or not, we do a trial. This is a very simple process, and the first recorded attempt at some kind of trial was in the Bible (Daniel 1:12, if you’re interested). First, you need an unanswered question: for example, ‘Does giving steroids to a woman delivering a premature baby increase the chances of that baby surviving?’ Then you find some relevant participants, in this case, mothers about to deliver a premature baby. You’ll need a reasonable number of them, let’s say two hundred for this trial. Then you divide them into two groups at random, give the mothers in one group the current best treatment (whatever that is in your town), while the mothers in the other group get current best treatment plus some steroids. Finally, when all two hundred women have gone through your trial, you count up how many babies survived in each group.

This is a real-world question, and lots of trials were done on this topic, starting from 1972 onwards: two trials showed that steroids saved lives, but five showed no significant benefit. Now, you will often hear that doctors disagree when the evidence is mixed, and this is exactly that kind of situation. A doctor with a strong pre-existing belief that steroids work—perhaps preoccupied with some theoretical molecular mechanism, by which the drug might do something useful in the body—could come along and say: “Look at these two positive trials! Of course we must give steroids!” A doctor with a strong prior intuition that steroids were rubbish might point at the five negative trials and say: “Overall the evidence shows no benefit. Why take a risk?”

Up until very recently, this was basically how medicine progressed. People would write long, languorous review articles—essays surveying the literature—in which they would cite the trial data they’d come across in a completely unsystematic fashion, often reflecting their own prejudices and values. Then, in the 1980s, people began to do something called a “systematic review”. This is a clear, systematic survey of the literature, with the intention of getting all the trial data you can possibly find on one topic, without being biased towards any particular set of findings. In a systematic review, you describe exactly how you looked for data: which databases you searched, which search engines and indexes you used, even what words you searched for. You pre-specify the kinds of studies that can be included in your review, and then you present everything you’ve found, including the papers you rejected, with an explanation of why. By doing this, you ensure that your methods are fully transparent, replicable and open to criticism, providing the reader with a clear and complete picture of the evidence. It may sound like a simple idea, but systematic reviews are extremely rare outside clinical medicine, and are quietly one of the most important and transgressive ideas of the past forty years.

When you’ve got all the trial data in one place, you can conduct something called a meta-analysis, where you bring all the results together in one giant spreadsheet, pool all the data and get one single, summary figure, the most accurate summary of all the data on one clinical question. The output of this is called a “blobbogram,” and you can see one on the following page, in the logo of the Cochrane Collaboration, a global, non-profit academic organization that has been producing gold-standard reviews of evidence on important questions in medicine since the 1980s.

This blobbogram shows the results of all the trials done on giving steroids to help premature babies survive. Each horizontal line is a trial: if that line is further to the left, then the trial showed steroids were beneficial and saved lives. The central, vertical line is the ‘line of no effect’: and if the horizontal line of the trial touches the line of no effect, then that trial showed no statistically significant benefit. Some trials are represented by longer horizontal lines: these were smaller trials, with fewer participants, which means they are prone to more error, so the estimate of the benefit has more uncertainty, and therefore the horizontal line is longer. Finally, the diamond at the bottom shows the ‘summary effect’: this is the overall benefit of the intervention, pooling together the results of all the individual trials. These are much narrower than the lines for individual trials, because the estimate is much more accurate: it is summarizing the effect of the drug in many more patients. On this blobbogram you can see—because the diamond is a long way from the line of no effect—that giving steroids is hugely beneficial. In fact, it reduces the chances of a premature baby dying by almost half.

The amazing thing about this blobbogram is that it had to be invented, and this happened very late in medicine’s history. For many years we had all the information we needed to know that steroids saved lives, but nobody knew they were effective, because nobody did a systematic review until 1989. As a result, the treatment wasn’t given widely, and huge numbers of babies died unnecessarily; not because we didn’t have the information, but simply because we didn’t synthesize it together properly.

In case you think this is an isolated case, it’s worth examining exactly how broken medicine was until frighteningly recent times. The diagram on the following page contains two blobbograms, or “forest plots,” showing all the trials ever conducted to see whether giving streptokinase, a clot-busting drug, improves survival in patients who have had a heart attack.

Look first only at the forest plot on the left. This is a conventional forest plot, from an academic journal, so it’s a little busier than the stylized one in the Cochrane logo. The principles, however, are exactly the same. Each horizontal line is a trial, and you can see that there is a hodgepodge of results, with some trials showing a benefit (they don’t touch the vertical line of no effect, headed ‘1’) and some showing no benefit (they do cross that line). At the bottom, however, you can see the summary effect—a dot on this old-fashioned blobbogram, rather than a diamond. And you can see very clearly that overall, streptokinase saves lives.

So what’s that on the right? It’s something called a cumulative meta-analysis. If you look at the list of studies on the left of the diagram, you can see that they are arranged in order of date. The cumulative meta-analysis on the right adds in each new trial’s results, as they arrived over history, to the previous trials’ results. This gives the best possible running estimate, each year, of how the evidence would have looked at that time, if anyone had bothered to do a meta-analysis on all the data available to them. From this cumulative blobbogram you can see that the horizontal lines, the “summary effects”, narrow over time as more and more data is collected, and the estimate of the overall benefit of this treatment becomes more accurate. You can also see that these horizontal lines stopped touching the vertical line of no effect a very long time ago—and crucially, they do so a long time before we started giving streptokinase to everyone with a
heart attack.

In case you haven’t spotted it for yourself already—to be fair, the entire medical profession was slow to catch on—this chart has devastating implications. Heart attacks are an incredibly common cause of death. We had a treatment that worked, and we had all the information we needed to know that it worked, but once again we didn’t bring it together systematically to get that correct answer. Half of the people in those trials at the bottom of the blobbogram were randomly assigned to receive no streptokinase, I think unethically, because we had all the information we needed to know that streptokinase worked: they were deprived of effective treatments. But they weren’t alone, because so were most of the rest of the people in the world at the time.

These stories illustrate, I hope, why systematic reviews and meta-analyses are so important: we need to bring together all of the evidence on a question, not just cherry-pick the bits that we stumble upon, or intuitively like the look of. Mercifully the medical profession has come to recognize this over the past couple of decades, and systematic reviews with meta-analyses are now used almost universally, to ensure that we have the most accurate possible summary of all the trials that have been done on a particular medical question.

But these stories also demonstrate why missing trial results are so dangerous. If one researcher or doctor “cherry-picks,” when summarizing the existing evidence, and looks only at the trials that support their hunch, then they can produce a misleading picture of the research. That is a problem for that one individual (and for anyone who is unwise or unlucky enough to be influenced by them). But if we are all missing the negative trials, the entire medical and academic community, around the world, then when we pool the evidence to get the best possible view of what works—as we must do—we are all completely misled. We get a misleading impression of the treatment’s effectiveness: we incorrectly exaggerate its benefits; or perhaps even find incorrectly that an intervention was beneficial, when in reality it did harm.

Now that you understand the importance of systematic reviews, you can see why missing data matters. But you can also appreciate that when I explain how much trial data is missing, I am giving you a clean overview of the literature, because I will be explaining that evidence using systematic reviews.

How much data is missing?
If you want to prove that trials have been left unpublished, you have an interesting problem: you need to prove the existence of studies you don’t have access to. To work around this, people have developed a simple approach: you identify a group of trials you know have been conducted and completed, then check to see if they have been published. Finding a list of completed trials is the tricky part of this job, and to achieve it people have used various strategies: trawling the lists of trials that have been approved by ethics committees (or “institutional review boards” in the USA), for example; or chasing up the trials discussed by researchers at conferences.

In 2008 a group of researchers decided to check for publication of every trial that had ever been reported to the U.S. Food and Drug Administration for all the antidepressants that came onto the market between 1987 and 2004. This was no small task. The FDA archives contain a reasonable amount of information on all the trials that were submitted to the regulator in order to get a license for a new drug. But that’s not all the trials, by any means, because those conducted after the drug has come onto the market will not appear there; and the information that is provided by the FDA is hard to search, and often scanty. But it is an important subset of the trials, and more than enough for us to begin exploring how often trials go missing, and why. It’s also a representative slice of trials from all the major drug companies.

The researchers found seventy-four studies in total, representing 12,500 patients’ worth of data. Thirty-eight of these trials had positive results, and found that the new drug worked; thirty-six were negative. The results were therefore an even split between success and failure for the drugs, in reality. Then the researchers set about looking for these trials in the published academic literature, the material available to doctors and patients. This provided a very different picture. Thirty-seven of the positive trials—all but one—were published in full, often with much fanfare. But the trials with negative results had a very different fate: only three were published. Twenty-two were simply lost to history, never appearing anywhere other than in those dusty, disorganized, thin FDA files. The remaining eleven which had negative results in the FDA summaries did appear in the academic literature, but were written up as if the drug was a success. If you think this sounds absurd, I agree: we will see in Chapter 4, on ‘bad trials’, how a study’s results can be reworked and polished to distort and exaggerate its findings.

This was a remarkable piece of work, spread over twelve drugs from all the major manufacturers, with no stand-out bad guy. It very clearly exposed a broken system: in reality we have thirty-eight positive trials and thirty-seven negative ones; in the academic literature we have forty-eight positive trials and three negative ones. Take a moment to flip back and forth between those in your mind: “thirty-eight positive trials, thirty-seven negative”; or “forty-eight positive trials and only three negative”.

If we were talking about one single study, from one single group of researchers, who decided to delete half their results because they didn’t give the overall picture they wanted, then we would quite correctly call that act ‘research misconduct’. Yet somehow when exactly the same phenomenon occurs, but with whole studies going missing, by the hands of hundreds and thousands of individuals, spread around the world, in both the public and private sector, we accept it as a normal part of life. It passes by, under the watchful eyes of regulators and professional bodies who do nothing, as routine, despite the undeniable impact it has on patients.

Even more strange is this: we’ve known about the problem of negative studies going missing for almost as long as people have been doing serious science.

This was first formally documented by an American psychologist called Theodore Sterling in 1959. He went through every paper published in the four big psychology journals of the time, and found that 286 out of 294 reported a statistically significant result. This, he explained, was plainly fishy: it couldn’t possibly be a fair representation of every study that had been conducted, because if we believed that, we’d have to believe that almost every theory ever tested by a psychologist in an experiment had turned out to be correct. If psychologists really were so great at predicting results, there’d hardly be any point in bothering to run experiments at all. In 1995, at the end of his career, the same researcher came back to the same question, half a lifetime later, and found that almost nothing had changed.

Sterling was the first to put these ideas into a formal academic context, but the basic truth had been recognized for many centuries. Francis Bacon explained in 1620 that we often mislead ourselves by only remembering the times something worked, and forgetting those when it didn’t. Dr. Thomas Fowler in 1786 listed the cases he’d seen treated with arsenic, and pointed out that he could have glossed over the failures, as others might be tempted to do, but had included them. To do otherwise, he explained, would have been misleading.

Yet it was only three decades ago that people started to realize that missing trials posed a serious problem for medicine. In 1980 Elina Hemminki found that almost half the trials conducted in the mid-1970s in Finland and Sweden had been left unpublished. Then, in 1986, an American researcher called Robert Simes decided to investigate the trials on a new treatment for ovarian cancer. This was an important study, because it looked at a life-or-death question. Combination chemotherapy for this kind of cancer has very tough side effects, and knowing this, many researchers had hoped it might be better to give a single “alkylating agent” drug first, before moving on to full chemotherapy. Simes looked at all the trials published on this question in the academic literature, read by doctors and academics. From this, giving a single drug first looked like a great idea: women with advanced ovarian cancer (which is not a good diagnosis to have) who were on the alkylating agent alone were significantly more likely to survive longer.

Then Simes had a smart idea. He knew that sometimes trials can go unpublished, and he had heard that papers with less ‘exciting’ results are the most likely to go missing. To prove that this has happened, though, is a tricky business: you need to find a fair, representative sample of all the trials that have been conducted, and then compare their results with the smaller pool of trials that have been published, to see if there are any embarrassing differences. There was no easy way to get this information from the medicines regulator (we will discuss this problem in some detail later), so instead he went to the International Cancer Research Data Bank. This contained a register of interesting trials that were happening in the USA, including most of the ones funded by the government, and many others from around the world. It was by no means a complete list, but it did have one crucial feature: the trials were registered before their results came in, so any list compiled from this source would be, if not complete, at least a representative sample of all the research that had ever been done, and not biased by whether their results were positive or negative.

When Simes compared the results of the published trials against the pre-registered trials, the results were disturbing. Looking at the academic literature—the studies that researchers and journal editors chose to publish—alkylating agents alone looked like a great idea, reducing the rate of death from advanced ovarian cancer significantly. But when you looked only at the pre-registered trials—the unbiased, fair sample of all the trials ever conducted—the new treatment was no better than old-fashioned chemotherapy.

Simes immediately recognized—as I hope you will too—that the question of whether one form of cancer treatment is better than another was small fry compared to the depth charge he was about to set off in the medical literature. Everything we thought we knew about whether treatments worked or not was probably distorted, to an extent that might be hard to measure, but that would certainly have a major impact on patient care. We were seeing the positive results, and missing the negative ones. There was one clear thing we should do about this: start a registry of all clinical trials, demand that people register their study before they start, and insist that they publish the results at the end.

That was 1986. Since then, a generation later, we have done very badly. In this book, I promise I won’t overwhelm you with data. But at the same time, I don’t want any drug company, or government regulator, or professional body, or anyone who doubts this whole story, to have any room to wriggle. So I’ll now go through all the evidence on missing trials, as briefly as possible, showing the main approaches that have been used. All of what you are about to read comes from the most current systematic reviews on the subject, so you can be sure that it is a fair and unbiased summary of the results.

One research approach is to get all the trials that a medicines regulator has record of, from the very early ones done for the purposes of getting a license for a new drug, and then check to see if they all appear in the academic literature. That’s the method we saw used in the paper mentioned above, where researchers sought out every paper on twelve antidepressants, and found that a 50/50 split of positive and negative results turned into forty-eight positive papers and just three negative ones. This method has been used extensively in several different areas of medicine:

Lee and colleagues, for example, looked for all of the 909 trials submitted alongside marketing applications for all ninety new drugs that came onto the market from 2001 to 2002: they found that 66 per cent of the trials with significant results were published, compared with only 36 per cent of the rest.
Melander, in 2003, looked for all forty-two trials on five antidepressants that were submitted to the Swedish drug regulator in the process of getting a marketing authorization: all twenty-one studies with significant results were published; only 81 percent of those finding no benefit were published.
Rising et al., in 2008, found more of those distorted write-ups that we’ll be dissecting later: they looked for all trials on two years’ worth of approved drugs. In the FDA’s summary of the results, once those could be found, there were 164 trials. Those with favorable outcomes were a full four times more likely to be published in academic papers than those with negative outcomes. On top of that, four of the trials with negative outcomes changed, once they appeared in the academic literature, to favor the drug.

If you prefer, you can look at conference presentations: a huge amount of research gets presented at conferences, but our current best estimate is that only about half of it ever appears in the academic literature. Studies presented only at conferences are almost impossible to find, or cite, and are especially hard to assess, because so little information is available on the specific methods used in the research (often as little as a paragraph). And as you will see shortly, not every trial is a fair test of a treatment. Some can be biased by design, so these details matter.

The most recent systematic review of studies looking at what happens to conference papers was done in 2010, and it found thirty separate studies looking at whether negative conference presentations—in fields as diverse as an aesthetics, cystic fibrosis, oncology, and A&E—disappear before becoming fully fledged academic papers. Overwhelmingly, unflattering results are much more likely to go missing.

If you’re very lucky, you can track down a list of trials whose existence was publicly recorded before they were started, perhaps on a register that was set up to explore that very question. From the pharmaceutical industry, up until very recently, you’d be very lucky to find such a list in the public domain. For publicly funded research the story is a little different, and here we start to learn a new lesson: although the vast majority of trials are conducted by the industry, with the result that they set the tone for the community, this phenomenon is not limited to the commercial sector.

By 1997 there were already four studies in a systematic review on this approach. They found that studies with significant results were two and a half times more likely to get published than those without.
A paper from 1998 looked at all trials from two groups of trialists sponsored by the U.S. National Institutes of Health over the preceding ten years, and found, again, that studies with significant results were more likely to be published.
Another looked at drug trials notified to the Finnish National Agency, and found that 47 per cent of the positive results were published, but only 11 per cent of the negative ones.
Another looked at all the trials that had passed through the pharmacy department of an eye hospital since 1963: 93 per cent of the significant results were published, but only 70 per cent of the negative ones.

The point being made in this blizzard of data is simple: this is not an under-researched area; the evidence has been with U.S. for a long time, and it is neither contradictory nor ambiguous.

Two French studies in 2005 and 2006 took a new approach: they went to ethics committees, and got lists of all the studies they had approved, and then found out from the investigators whether the trials had produced positive or negative results, before finally tracking down the published academic papers. The first study found that significant results were twice as likely to be published; the second that they were four times as likely. In Britain, two researchers sent a questionnaire to all the lead investigators on 101 projects paid for by NHS R&D: it’s not industry research, but it’s worth noting anyway. This produced an unusual result: there was no statistically significant difference in the publication rates of positive and negative papers.

But it’s not enough simply to list studies. Systematically taking all the evidence that we have so far, what do we see overall?

It’s not ideal to lump every study of this type together in one giant spreadsheet, to produce a summary figure on publication bias, because they are all very different, in different fields, with different methods. This is a concern in many meta-analyses (though it shouldn’t be overstated: if there are lots of trials comparing one treatment against placebo, say, and they’re all using the same outcome measurement, then you might be fine just lumping them all in together).

But you can reasonably put some of these studies together in groups. The most current systematic review on publication bias, from 2010, from which the examples above are taken, draws together the evidence from various fields. Twelve comparable studies follow up conference presentations, and taken together they find that a study with a significant finding is 1.62 times more likely to be published. For the four studies taking lists of trials from before they started, overall, significant results were 2.4 times more likely to be published. Those are our best estimates of the scale of the problem. They are current, and they are damning.

All of this missing data is not simply an abstract academic matter: in the real world of medicine, published evidence is used to make treatment decisions. This problem goes to the core of everything that doctors do, so it’s worth considering in some detail what impact it has on medical practice. First, as we saw in the case of reboxetine, doctors and patients are misled about the effects of the medicines they use, and can end up making decisions that cause avoidable suffering, or even death. We might also choose unnecessarily expensive treatments, having been misled into thinking they are more effective than cheaper older drugs. This wastes money, ultimately depriving patients of other treatments, since funding for health care is never infinite.

It’s also worth being clear that this data is withheld from everyone in medicine, from top to bottom. Most countries have organizations to create careful, unbiased summaries of all the evidence on new treatments to determine whether they are cost effective. In the UK the organization is called NICE (the National Institute for Health and Clinical Excellence); in Germany it is called IQWiG, while in the U.S. insurers may make their own assessments. But these organizations are unable either to identify or to access data that has been withheld by researchers or companies on a drug’s effectiveness; they have no more legal right to that data than you or I do. In fact, as we shall see, some regulators, despite having access to this information, have refused to share it with the public or doctors. Others have hidden the information they hold behind walls of chaos. This is an extraordinary and perverse situation.

So, while doctors are kept in the dark, patients are exposed to inferior treatments, ineffective treatments, unnecessary treatments, and unnecessarily expensive treatments that are no better than cheap ones; governments pay for unnecessarily expensive treatments, and mop up the cost of harms created by inadequate or harmful treatment; and individual participants in trials, such as those in the TGN1412 study, are exposed to terrifying, life-threatening ordeals, resulting in lifelong scars, again quite unnecessarily.

At the same time, the whole of the research project in medicine is retarded, as vital negative results are held back from those who could use them. This affects everyone, but it is especially egregious in the world of “orphan diseases,” medical problems that affect only small numbers of patients, because these corners of medicine are already short of resources, and are neglected by the research departments of most drug companies, since the opportunities for revenue are thinner. People working on orphan diseases will often research existing drugs that have been tried and failed in other conditions, but that have theoretical potential for the orphan disease. If the data from earlier work on these drugs in other diseases is missing, then the job of researching them for the orphan disease is both harder and more dangerous: perhaps they have already been shown to have benefits or effects that would help accelerate research; perhaps they have already been shown to be actively harmful when used on other diseases, and there are important safety signals that would help protect future research participants from harm. Nobody can tell you.

Finally, and perhaps most shamefully, when we allow unflattering data to go unpublished, we betray the patients who participated in these studies: the people who have given their bodies, and sometimes their lives, in the implicit belief that they are doing something to create new knowledge, that will benefit others in the same position as them in the future. In fact, their belief is not implicit: often it’s exactly what we tell them, as researchers, and it is a lie, because the data might be withheld, and we know it.

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American