Every year in the U.S., doctor's offices and hospitals order billions of laboratory tests to measure everything from cholesterol levels in the blood to the presence of a gene thought to increase the risk of developing Alzheimer's disease. Physicians and patients typically assume that they can trust the results of these tests. And most of the time they can. But not all lab tests are equally reliable, and faulty ones can have serious consequences. Sometimes they fail to detect life-threatening conditions. Other times they indicate a problem that does not exist, which can lead to unneeded, perhaps even dangerous treatments.
Through a quirk of regulatory history, many such tests are not subject to the same medical standards as other tools used to identify risk for disease or to definitively diagnose a condition. These are called lab-developed tests, or LDTs, defined as tests that are manufactured and interpreted by the same individual lab that designed them—in contrast to, say, a quick strep test meant to be used and understood by a wide variety of personnel in doctor's offices everywhere. Most people first encounter an LDT during a checkup when the physician is faced with a diagnostic dilemma that cannot be resolved by widely available blood tests.
The trouble is, experts believe many of these tests are not useful, and some may even cause harm by convincing too many people that they have a rare illness when they do not, diagnosing them with a condition that has so far not been shown to be harmful or reassuring them that they are healthy when in fact there is no scientifically credible way to know if that is indeed the case. “We tend to think of lab tests as being the ultimate truth,” says Ramy Arnaout, an assistant professor of pathology at Harvard Medical School. “But no test is 100 percent accurate, and some of these LDTs aren't medically useful at all.”
The U.S. Food and Drug Administration is now taking steps to restore confidence in the reliability of lab-developed tests. In 2014 the agency released proposed guidelines that will subject the measures, for the first time, to federal oversight—including having to submit evidence of efficacy to it before the tests may be marketed. Although the FDA would not comment for this story, several industry sources believe the final rulings may begin taking effect soon, much to the chagrin of some lab directors who say that the requirements could boost costs and hinder medical practice.
Twenty-five years ago LDTs played too small a role in medical practice for the FDA to pay them much attention. Only a few—most notably Pap smears for the detection of cervical cancer—were widely used. FDA officials adopted a policy of “enforcement discretion,” which meant they pretty much left LDTs alone while they focused on tools with an apparently greater potential for harm, such as malfunctioning pacemakers.
After researchers developed new genetic engineering techniques in the 1990s, however, the possibilities for LDTs expanded dramatically. Whereas previous generations of LDTs looked for a handful of unusual proteins, for example, some of the newly emerging genetic tests could sort through any number of the three billion base pairs, or letters, of the DNA alphabet found in the human genome, looking for abnormalities related to disease. In addition, testing became automated, making LDTs increasingly easier to design and use.
The improved technology led to an enormous rise in the number and variety of LDTs that came to market. By some estimates, about 11,000 labs now offer between 60,000 and 100,000 of them; no one knows precisely how many because, of course, these tests do not have to be registered anywhere.
Under current federal regulations, LDTs enjoy a big loophole, which means they do not have to be evaluated for their medical usefulness. Nor are they required to have research about them made public. The lab that created them does need to meet certain fundamental standards of scientific practice. But the FDA does not vet the tests either before or after doctors can start ordering them for patients, as it does for most prescription drugs or medical devices.
This loophole means that companies ranging from small start-ups offering just one or two tests to much larger diagnostic labs that offer thousands of tests can develop and charge for new LDTs much more easily than they can for most other categories of medical products. With the rise in the number of tests has come a series of reports showing that certain ones have already hurt people by delivering misleading results.
The FDA has cited 20 different types of LDTs as especially troubling, including Lyme disease and whooping cough tests that regularly give wrong answers and LDTs that purport to determine a woman's risk for ovarian cancer such as by measuring the presence of the protein CA 125 in the blood. In September the agency concluded that screening measures for this protein offered “no proven benefit” and warned physicians against recommending or using them.
Many of the tests that have raised the FDA's ire may indeed measure what they claim to measure. The problem is that the measured substance may not be a good indicator of a specific medical problem. In the case of the ovarian cancer tests, for instance, high levels of CA 125, which is made in the ovaries, should in theory signify the presence of extra ovarian cells—in other words, the presence of a tumor. In reality, it turns out that many women with high levels of CA 125 do not have ovarian cancer, and, conversely, many women with cancer do not have high levels of CA 125. Thus, measures of CA 125 cannot be trusted to give an accurate diagnosis of cancer—and yet a number of women who tested positive apparently feared the possibility of cancer so much that they decided to have their healthy ovaries removed anyway.
One way that investigators determine whether a medical test should be used as a guide to a patient's condition is by applying a somewhat obscure statistical ratio called a positive predictive value, or PPV. This measure takes into account just how common a condition might be in a given group of people.
Why such a consideration would be important in determining a test's usefulness may be best understood by analogy. If you drop a baited hook into a barrel full of fish, the chances that a tug on the line means that you have caught a fish are pretty high. On the other hand, dropping the same baited hook into a freshwater lake that has not been stocked with fish makes it much less likely that any given tug on the line represents a fish, as opposed to, say, a tree snag. Because the barrel contains many more fish for a given volume of water than the lake does, a tug in the container has a PPV close to 100 percent, whereas that of a tug in an unstocked lake is much less than 100 percent.
This crucial statistical distinction explains the problem the FDA has with one current ovarian screening test, which its developer claimed had a PPV of 99.3 percent. Closer analysis by independent biostatisticians revealed, however, that the value was calculated on the basis of a single experiment in which half the patients were already known to have ovarian cancer—a highly selected group that is the medical equivalent of a stocked pond.
When the researchers recalculated the PPV using ovarian cancer's true frequency in the general U.S. population of one case for every 2,500 postmenopausal women, the PPV plummeted to just 6.5 percent. In other words, only one in every 15 patients who received a positive result from this malignancy test would have actually had ovarian cancer. The other 14 would, if they had relied on this test alone, very likely have undergone unnecessary operations to remove their otherwise healthy ovaries because they would have mistakenly believed they had a 99.3 percent chance of having cancer.
Because the FDA does not have the resources to oversee all the LDTs that have come to market in recent years, the agency plans to divide them into three categories, based on the likelihood that a misleading or incorrect result from a particular test could cause substantial harm. Under the new guidelines, LDTs would be considered high risk if inaccurate results could lead to death or prolonged disabilities. Such tests would come under the greatest inspection, information about them would need to be entered in a national database, and manufacturers would have to prove their safety and efficacy to the FDA before they could be sold. “Basically, the FDA wants to see the supporting evidence before it allows a high-risk LDT to go out on the market,” says Joshua Sharfstein, a physician and professor at the Johns Hopkins Bloomberg School of Public Health.
Even this targeted approach worries many industry leaders and some professional medical societies, including the American Medical Association. “It really depends on how the FDA chooses to define high risk, and that currently isn't clear,” says Curtis Hanson, chief medical officer at Mayo Medical Laboratories in Rochester, Minn., which conducts 25 million lab tests a year. “High-risk tests could amount to between 1 and 10 percent of LDTs on the market today. How is the FDA going to review and find the rare cases where you have problems and do that in an efficient way that doesn't slow progress?”
For patients and their physicians, the question is much more basic. Why should they ever have to wonder whether a commercially available medical test does more harm than good?