The Best Medicine: Cutting Health Costs with Comparative Effectiveness Research

A quiet revolution in comparative effectiveness research just might save us from soaring medical costs

It was the largest and most important investigation of treatments for high blood pressure ever conducted, with a monumental price tag to match. U.S. doctors enrolled 42,418 patients from 623 offices and clinics, treated participants with one of four commonly prescribed drugs, and followed them for at least five years to see how well the medications controlled their blood pressure and reduced the risk of heart attack, stroke and other cardiovascular problems. It met the highest standards of medical research: neither physicians nor their patients knew who was placed in which treatment group, and patients had an equal chance of being assigned to any of the groups. Such randomized controlled trials have long been unmatched as a way to determine the safety and efficacy of drugs and other treatments. This one, dubbed ALLHAT (Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial), cost an estimated $120 million and took eight years to complete.

The results, announced in December 2002, were stunning: the oldest and cheapest of the drugs, known as thiazide-type diuretics, were more effective at reducing hypertension than the newer, more expensive ones. Furthermore, the diuretics, which work by ridding the body of excess fluid, were better at reducing the risk of developing heart failure, of being hospitalized and of having a stroke. ALLHAT was well worth its premium cost, argued the National Heart, Lung, and Blood Institute (nhlbi), which ran the trial. If patients were prescribed diuretics for hypertension rather than the more expensive medications, the nation would save $3.1 billion every decade in prescription drug costs alone—and hundreds of millions of dollars more by avoiding stroke treatment, coronary artery bypass surgery and other consequences of high blood pressure.

But what should patients do if their blood pressure was not controlled by a diuretic alone, as happened with 60 percent of the ALLHAT patients? Which drugs should they turn to then? That was the next logical study to do, but the nhlbi could not afford to conduct another randomized controlled trial to find out. That is when David J. Magid had his big idea. As director of research for the Colorado Permanente Medical Group, part of the giant Kaiser Permanente health care organization, Magid had as much respect for classical clinical trials as the next scientist. But he thought there was a way to obtain equally rigorous results without going through the prolonged length and expense of a trial. Instead, he thought, he could comb through the thousands of electronic health records in Kaiser’s database to find out which antihypertension drugs work best if diuretics do not bring about the needed reduction in blood pressure.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Magid had his answer in a year and a half, at a cost of only $200,000—a tiny fraction of the expected cost of a clinical trial. Two other heart medications, called angiotensin-converting enzyme (ACE) inhibitors and beta blockers, did an equally effective job as second-line treatments, he and his colleagues reported in 2010. Doctors could prescribe either drug to patients whose blood pressure was not controlled by a diuretic alone. “Randomized trials are so expensive and time-consuming, there’s no way we can do them for all the important questions that need answering,” Magid says. “Using health records [to compare treatments] offers a practical alternative.”

Difficult Truths
Magid is a pioneer in an increasingly influential movement to change the way clinicians and researchers determine which medications, surgeries or other treatments work best for a given illness or disorder. Formally called comparative effectiveness research (CER), it determines scientifically which therapies work and which do not. The approach is often easiest to understand in direct comparisons between different medications or between medication and surgery. But its methods are being used to evaluate a widening range of interventions, many of which have little to do with drugs—such as whether community health programs that offer transportation and housing assistance are more effective at keeping frail elderly men and women out of the hospital than programs that focus on more traditional medical services.

The need for greater scrutiny stems from pressing medical and economic challenges. The medical need for CER arises from a fact that few patients realize and fewer doctors acknowledge: the scientific basis for many medical treatments is often flimsy or even nonexistent. More than half the guidelines issued by the Infectious Disease Society of America, for instance, are based on “expert opinion” alone and not on actual comparative data, let alone a clinical trial. “There is a chasm between what gets done in practice and what science has shown,” says Elizabeth A. McGlynn, the new director of Kaiser’s Center for Effectiveness & Safety Research. At the same time, she notes, clinicians complain that scientific studies often cannot easily be translated to a real-world environment.

The economic imperative for comparative effectiveness research is just as compelling. Individual health plans have compared costs and outcomes of various treatments for years in an effort to trim their budgets, and yet health care spending in the U.S. has been projected to reach $2.7 trillion in 2011. That amount may sound like a reasonable price to pay for something consumers value (even if it dwarfs other expenditures, such as the $671 billion that will be spent by the Pentagon next year). Unnecessary health care spending, however, means that fewer dollars are available for investment, for education, for research and for other national needs. “As much as one third of our [medical] spending is for ineffective or unnecessary care,” McGlynn says—around $900 billion a year, in other words. (By comparison, malpractice reform could save about $54 billion over 10 years, according to a 2009 analysis by the Congressional Budget Office.) “We can’t afford to spend money on things that don’t work,” she continues, especially when the nation’s soaring health care bills threaten to capsize state and local governments, businesses and Medicare, which are the “third parties” that pay for most of these medical costs. In an effort to save money by ensuring that the nation pays only for treatments that work, the economic stimulus bill of 2009 allocated $1.1 billion for comparative effectiveness research.

That is a lot of money but a pittance compared with the cost of such research—at least the traditional kind of CER, which uses clinical trials to distinguish therapies that help patients from those that do not—and how much of it is needed. A 2009 report by the Institute of Medicine, part of the National Academies, easily identified 100 questions of relative effectiveness that need answering. Multiplying 100 questions by a few hundred million dollars per question equals “unaffordable.” Hence, the need for the novel, less costly approach to comparative effectiveness research such as Magid’s, which exploits the latest information technology tools—from mining the databases of large, integrated health networks such as Kaiser’s to sophisticated mathematical modeling of disease—in an effort to discover what works at a fraction of the cost of randomized controlled trials.

The cost of clinical trials is not the only impetus for the sea change under way in CER. The new research promises to yield better information: data that are more useful in clinical practice than data from traditional trials.

The reason is that clinical trials tend to enroll people who are younger, healthier and more likely to take prescribed medications; the study subjects are also monitored more closely by a physician than the average patient is. Some physicians therefore object that trial results may not apply to the older, sicker, less compliant patients they treat. In addition, traditional randomized clinical trials assess efficacy, which is the best-case, often idealized measure of a drug’s or other therapy’s benefits. In contrast, most physicians are concerned with effectiveness, which means how well a treatment works in real patients in real-world conditions. As a result, doctors can and do dismiss results obtained in the hothouse of randomized clinical trials as inapplicable to their patients. Despite ALLHAT, for instance, only 36 percent of first prescriptions for hypertension are diuretics, a 2009 study found, reflecting, in part, the belief of some physicians that the results are not relevant to their patients. If rigorous studies evaluate the real-world effectiveness of different interventions, Magid argues, more physicians would likely incorporate the results into their clinical practice.

As with any major changes to how health care is delivered in the U.S., CER is viewed with alarm by critics who are nervous that it might restrict physician autonomy and patient choice. But as the field develops rigorous, efficient ways to answer the most important question any patient or doctor can ask—what works?—it will inevitably play a growing and crucial role in health care at both the individual and policy level.

Into the Data Mine
Fortunately, the need to find inexpensive ways to conduct comparative effectiveness research and get results relevant to real patients in the real world has coincided with another tectonic change in health care: the spread of electronic medical records. Kaiser Permanente has them on 8.6 million people. A new consortium of six medical institutions, including the Cleveland Clinic and the Mayo Clinic, has electronic records on 10 million. The Veterans Administration, a pioneer in electronic health records, as well as CER, cares for more than six million veterans annually. Crucially, in every case the medical institution’s records are more complete and therefore more useful than standard Medicare claims data, which are often missing crucial details about a patient. All three—Kaiser, the consortium and the VA—have launched programs to mine those records by, for instance, taking all patients with type 2 diabetes, determining what treatment they got and comparing outcomes. “With these large databases and detailed clinical information, we can conduct comparative effectiveness research in real-world settings, with a full range of patients, not just those selected for clinical trials,” says Joe V. Selby, director of Kaiser’s division of research.

Analyzing millions of patients rather than the hundreds or thousands in a standard clinical trial also means the results are potentially more statistically sound—that is, findings are less likely to be to the result of chance. Another advantage of mining patient records: they include children and women of reproductive age, who are often barred from clinical trials because the risks are thought to outweigh the benefits.

At first glance, mining databases for information may seem a lot like conducting an old-style observational study, in which researchers find one group of patients who just happen to be receiving a particular therapy and another group who are receiving either no therapy or a different one. In contrast, a randomized controlled trial assigns patients to receive one or another treatment. Observational studies have yielded huge public health benefits (showing that cigarettes can cause lung cancer, for instance), but they can also mislead. It was observational studies that concluded, for instance, that long-term hormone therapy in older women whose estrogen levels begin to decline around menopause reduced the risk of heart disease, as well as bringing other benefits. In fact, as the 2002 Women’s Health Initiative—a prospective, randomized controlled trial—showed, hormone replacement does not protect against heart disease and raises the risk of stroke and breast cancer. The problem was that women using hormone replacement therapy in the observational studies were different in important ways from those who were not (if nothing else, they were being treated by a physician). Those differences, not hormone therapy, accounted for the women’s apparently lower risk of cardiovascular disease.

Today’s pioneers in the use of health records for CER are well aware that they are conducting observational studies. But they have developed statistical and other methodologies to safeguard against the errors that can bedevil such investigations. The key step is to make sure that it was not something about the patient rather than the treatment that accounted for a given outcome, as was the case in the observational studies of hormone replacement. “There is always the real possibility that people who get one treatment may be different in some ways from people who get another treatment,” Selby says. “To adjust for that, you need very detailed data, and Kaiser Permanente has it. It can tell you that patients [in the comparison groups] were identical for all practical purposes or allow you to adjust statistically for any remaining differences.”

And the Blind Shall See
Ophthalmologist Donald Fong of the Southern California Permanente Medical Group tapped those data to compare two treatments for age-related macular degeneration, the leading cause of severe vision loss in people older than 60 years. Since 2004 physicians had been using Avastin, a cancer drug manufactured by Genentech, against this disease. But that was an off-label use—that is, one for which the company did not have U.S. Food and Drug Administration approval but which physicians are allowed to prescribe anyway. In 2006 the fda approved Lucentis, also from Genentech, for macular degeneration. Avastin and Lucentis are very similar, but Avastin costs $50 per dose compared with $2,200 for Lucentis. That put physicians in a quandary: Should they continue to use Avastin off-label or switch patients to Lucentis?

Fong knew the question cried out for a scientific comparison. He decided not to conduct a long, expensive randomized controlled clinical trial, however. Instead, from 2005 to 2008, he and his colleagues entered 452 Kaiser patients into a separate registry—all patients who had not been treated before and who received only one drug for macular degeneration. The records showed that 324 people happened to be treated with Avastin and 128 with Lucentis, reflecting individual physician and patient preference rather than the random assignment a clinical trial would use. Although the Avastin patients happened to have worse visual acuity when they began treatment and had an average of two fewer injections over the 12 months they were followed, the improvement in visual acuity was equal with the two drugs, the scientists reported in 2009.

Such an observational study falls short of the statistical purity of a randomized controlled trial. But like other researchers mining health records to do CER, Fong and his colleagues used standard statistical techniques to control for hidden biases in the selection of their population study. They also made sure the Avastin and Lucentis patients were matched in terms of age, severity of vision loss and other key factors. The results, Fong argues, are both scientifically rigorous and more relevant to clinicians than a standard clinical trial. “This study had a much more realistic population,” he says. The patients’ average age was about 80, and they were not receiving the intense scrutiny and care of those in a clinical trial. “We didn’t exclude anyone. That makes it harder for physicians to say, ‘This doesn’t apply to my patients.’ ” As it happens, the results from the first year of a randomized controlled trial of Avastin and Lucentis, published online in the New England Journal of Medicine in April, support Fong’s findings as well.

Getting the Statistics Right
Scientists conducting CER by means of electronic medical records are developing a number of techniques to ensure that their results are statistically sound. Most crucial is to make sure that patients in two or more comparison groups—those receiving Lucentis and Avastin, say, or beta blockers and ACE inhibitors—are equivalent. To do this, researchers analyze scores of variables (100 is not unusual), ranging from socioeconomic data to lab results, to see whether any are more commonly found among those patients receiving one treatment and not another. By taking such variables into account, Selby explains, “you wind up comparing people with the same propensity to get a treatment but who actually got either.” That eliminates the risk of a hormone replacement therapy–type mistake, where receiving the treatment was actually a marker of better access to care.

In his antihypertension study, for instance, Magid analyzed medical records to identify any patients who were not equally likely to receive both drugs, the ACE inhibitor or beta blocker, such as patients with a preexisting condition that served as a contraindication for one of the two drugs. “We eliminated those cases and were left with only those patients who had an equal probability of being prescribed either an ACE inhibitor or a beta blocker,” Magid says. Then he identified patients who had similar health characteristics to reduce the chance that the comparison of the drugs would be biased by, say, one drug having been given to sicker patients.

“We made the populations as equal as possible,” he says, based on age, sex, concurrent conditions, vital signs, lab results (for kidney function, for instance), and socioeconomic factors such as education and income. For every 54-year-old white, female high school dropout with a baseline blood pressure of 150 over 80 in the beta blocker group who had these two concurrent conditions and took these three medications, Magid matched her to another 54-year-old white, female high school dropout with a baseline blood pressure of 150 over 80 in the ACE inhibitor group, who had the same medical conditions and was taking the same drugs. By the time he had finished, Magid had meticulously matched each patient receiving ACE inhibitors to one receiving beta blockers. Patients who could not be matched in this way were dropped from the study.

Because analyzing detailed health records yields results much faster than a prospective, randomized controlled trial, it has saved lives. Kaiser rheumatologist David H. Campen used this methodology when a colleague in academia mentioned that there were hints in lab animal studies that Vioxx, used for pain, might increase the risk of heart attacks and stroke. Analyzing Kaiser’s patient records, Campen and his colleagues found exactly that several months before Merck voluntarily withdrew Vioxx from the market in 2004. As it happened, fewer Kaiser patients were taking Vioxx and related drugs, called COX-2 inhibitors, than the national average. COX-2 inhibitors do not pose the same risk of gastrointestinal bleeding as other nonsteroidal anti-inflammatory drugs (NSAIDs) such as aspirin, but not all patients are at risk for such bleeding and so do not need the newer, pricier COX-2 inhibitors. At one point, Campen recalls, COX-2 use was approaching 50 percent of NSAID prescriptions in the U.S., but at Kaiser it stayed below 10 percent.

Bang for the Health Care Buck
Beyond evaluating how well different therapies treat a given disease, the new breed of comparative effectiveness researchers aims to compare the costs of those treatments—and to ask whether additional cost buys additional effectiveness. Until now, that question had been off-limits: a core tenet of American medicine has long been that cost considerations have no place in clinical decision making. As a result, CER has, traditionally, not considered cost. Two or more treatments are evaluated and ranked by clinical effectiveness, and that is that. But the soaring costs of health care have increased pressure to choose treatments that deliver the most bang for the health care buck.

Over the past few years, however, cost-effectiveness has been a focus of more and more analysis. In 2006 VA researchers studied patients with a difficult-to-treat form of heart disease that is characterized by diminished blood flow. Some received angioplasty, in which a surgeon widens an obstructed blood vessel (usually with a balloonlike device), and some underwent coronary artery bypass, in which blood flow is rerouted around the blockage with implanted grafts. Each procedure had an impressive three-year survival rate (82 percent with angioplasty and 79 percent with bypass). But total costs for angioplasty were $63,900 compared with $84,400 for bypass. In other words, angioplasty was slightly more effective, as well as less costly. After five years, 75 percent of angioplasty patients were alive, compared with 70 percent of bypass patients, with respective costs of $82,000 and $101,000—again, better survival, lower cost.

The path to using such results to actually control costs is not necessarily straightforward. The 2010 health care reform law bars Medicare from using comparative effectiveness research to decide what to pay for (Avastin but not Lucentis for macular degeneration, say), a concession to legislators who wanted assurance that patients and doctors would remain free to choose any treatment they like and who threatened to vote against the bill without that provision. But Medicare can use the research to set payment rates in a way that would encourage providers to deliver the best care for a given price, a system called “equal payments for equal results.” Using the example of macular degeneration, Medicare might pay $50 per injection—which would mean patients who insist on Lucentis, or whose doctor does, would be left with a $2,150 co-payment.

Making people pay more out of pocket is not the goal. It is only the means to the goal, which is to bring patients the most effective treatments—and not to raise the nation’s health care bill by subsidizing treatments that cost more for zero additional benefit. “As we move into the era of health care reform, we need to address the issue of how to pay for it,” Fong says. “One obvious answer is, you want to pay only for things that work.” When two medications work equally well, as he found Avastin and Lucentis did for age-related macular degeneration, the calculus should be easy. But how about when drug A costs 20 times more than drug B but yields only a 5 percent greater benefit, as measured by, for instance, survival, visual acuity, insulin control or number of hospitalizations? “We have to start asking, as a society, whether that marginal improvement is worth the price,” he points out. That will surely be a painful conversation, forcing society to grapple with how much we are willing to spend on marginal improvements in health.

The Obstacles to Come
Although rooting out ineffective treatments may sound like something patients, physicians and payers would all welcome, in fact, CER has gotten caught in the cross fire of the debates over health care reform. Chief among the charges: that the research will be used to “deny or ration care,” as Representative Mike Rogers of Michigan warned in 2009. In fact, the research does not compare whether different kinds of patients benefit from a given treatment as a way to keep one group from receiving the treatment, as “deny or ration” might imply. The goal of comparative effectiveness research is to weed out treatments that are less effective in everyone and substitute a more effective alternative. “There is an assault on CER going on now, saying it’s all about health care rationing,” says cardiologist Steven Nissen of the Cleveland Clinic. “They’re making headway even though that’s not what we’re talking about. CER is about delivering the best care, not rationing care.”

Such qualms are unique to the U.S., health experts say. In no other country do people “view evidence as suspiciously as U.S. stakeholders, including a large proportion of policy makers,” argued British researchers in an essay in the journal PharmacoEconomics last year. The U.K. embraced comparative effectiveness research long ago, incorporating its findings into decisions about what its National Health Service will cover. The evidence shows that CER is not a panacea; health care costs are still rising in the U.K.—though not as steeply as in the U.S. But basing health care decisions on CER clearly has not hurt British people, who actually enjoy higher life expectancy than do Americans.

“If there is any country in the world that needs comparative effectiveness research, it’s the U.S.,” Nissen says. “It’s safe to say the U.S. has the least cost-effective medicine in the world. There is so much money wasted that if we eliminated that waste, we could provide health care for everyone.” It will be an uphill battle in a country that reveres an individual’s right to choose much more than it does science. But comparative effectiveness research is our best hope for improving medical care equitably without breaking the bank.

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American