A small corner of the neuroscience world was in a frenzy. It was mid-June and a scientific paper had just been published claiming that years worth of results were riddled with errors.

The study had dug into the software used to analyze one kind of brain scan, called functional MRI. The software’s approach was wrong, the researchers wrote, calling into doubt “the validity of some 40,000 fMRI studies”—in other words, all of them.

The reaction was swift. Twitter lit up with panicked neuroscientists. Bloggers and reporters rained down headlines citing “seriously flawed” “glitches” and “bugs.” Other scientists thundered out essays defending their studies.

Finally, one of the authors of the paper, published in Proceedings of the National Academy of Sciences, stepped into the fray. In a blog post, Thomas Nichols wrote, “There is one number I regret: 40,000.” Their finding, Nichols went on to write, only affects a portion of all fMRI papers—or, some scientists think, possibly none at all. It wasn’t nearly as bad as the hype suggested.

The brief kerfuffle could just be dismissed as a tempest in a teapot, science’s self-correcting mechanisms in action. But the study, and its response, heralds a new level of self-scrutiny for fMRI studies, which have been plagued for decades by accusations of shoddy science and pop-culture pandering.

fMRI, in other words, is growing up, but not without some pains along the way.

A bumpy start for brain scanning

When it was first used in humans in 1992, fMRI gave scientists their first real dynamic window into the brains of humans, thanks to a powerful magnet that could measure the oxygen in the blood flowing to different parts of the brain. Inside the machine, researchers could see different areas “firing up” in real time when people looked at animals, made decisions, or recalled memories. It was a breakthrough.

Soon studies were pouring out purporting to use fMRI to prove that “men use only half of brain to listen,” to find the location of the “oops center,” or to interpret the minds of swing voters in the 2008 election.

fMRI has since then penetrated the popular imagination and many areas of society. fMRI-based lie detectors have been hailed as a high-tech improvement over the polygraph test (though using this type of evidence is still a subject of legal debate). “Neuromarketing” uses fMRI to strategize how to sell things to us. And pop science authors sprinkle fMRI studies liberally through their books to prove, for instance, how we make decisions.

But over the years the biggest problem the field has had to contend with is the constant problem of statistics. Faulty statistical methods can lead to false-positive results—and researchers in the field tend to agree that there are probably false positives out there.

That was famously demonstrated in 2009, when scientists put a dead salmon inside an fMRI scanner, showed the salmon photographs, did some typical analysis, and found “brain activity” related to what the salmon was “seeing.” In this case, the poor use of statistics was leading to false positives.

That same year, other researchers pointed out “puzzlingly high correlations” between personality traits and brain activity in fMRI studies—puzzling because measurements of personality traits and brain activity are both really noisy and can’t, theoretically, be so tightly correlated. In that case, poor use of statistics was exaggerating the effect size, leading to misleading conclusions.

Scientists correct these methodological issues as they find them, though it takes some time to turn the tide. But in general, scientists seem to trend conservative, sticking with methods they’ve used before even if the old methods have been shown repeatedly to have flaws.

Signs of progress

Now, as it nears a quarter-century of use, fMRI has matured into a major neuroscience research technique. The number of papers in PubMed citing fMRI rose from under 200 in 1995 to over 6,700 in 2015. There are hundreds of labs using the technique worldwide.

At the same time, there’s been an upsurge in studies questioning fMRI methodology. It’s is a sign of willingness on the part of scientists and funding agencies to try to confront some of the field’s past sins. 

“I think if I’d done this five years ago, someone would be like, ‘Oh, well, you’re just pissing on things,’” Nichols said. “Whereas now I think it’s like, ‘Right, yeah, we do need to pay attention to these things, don’t we?’”

Nichols, a professor of statistics at the University of Warwick in the UK, now thinks that the deceptive statistics are found in roughly 3,500 studies, not the 40,000 he initially cited. He’s submitted a correction to the journal.

Among those 3,500, only some will have come to a wrong conclusion because of the error. Studies where the amount of brain activity is really dramatic would still have positive results if they were to be reanalyzed; those on the cusp would be more in danger. “You would only know if you go in and examine these [studies] one by one,” Nichols said.

That’s unlikely to happen. But the developers that make fMRI analysis packages are busily updating their software even as we speak.

There are other signs of new maturity as well. Prominent fMRI scientists, including Russell Poldrack, director of the Stanford Center for Reproducible Neuroscience, along with Nichols and others, recently published a lengthy white paper on the “Best Practices in Data Analysis and Sharing in Neuroimaging using MRI.” The paper proposes standardizing the reporting of results, data, and analysis, as well as making these publicly available.

This could resolve one big difficulty in the wake of the Nichols’s paper: Researchers can’t tell for sure which past studies used faulty statistical methods. If new studies use the best practices, the methods will be much more transparent.

It would also mean that others could go back and reanalyze study results—or better yet, pool the results from different studies to see whether the results hold up in larger samples.

Pooling data seems obvious, but it wasn’t always so. When fMRI was around 10 years old, researchers started performing “the traveling graduate student experiment”: send graduate students to three or four different labs to do the same task in different fMRI scanners. What they discovered was that the images of the same person sometimes looked different on different machines. One reason for this is that the fMRI machines people used were different in terms of the strength of their magnets. The technology wasn’t standardized, something that continues to be an issue today.

Then, in 2009, Dr. Michael Milham from the Child Mind Institute pooled together data from many labs for a dataset of over 1,000 fMRI scans, and demonstrated that they shared common features. “It really was shocking, I think, to everyone,” said Greg Farber, director of Office of Technology Development and Coordination at the National Institute of Mental Health (NIMH). He credits the Milham paper as a turning point for pooling fMRI data.

Since then data sharing has become even easier, and more talked about. “The internet’s faster, data repositories are bigger, the field, at least for fMRI, is becoming more standardized, so you could actually share data and actually have people look at it,” said Peter Bandettini, chief of functional imaging methods at NIMH.

For the last year at least, NIMH has been collecting raw fMRI data from labs it funds and putting it into a data archive they host. Most researchers getting funds from NIMH to do fMRI research are now required, or at least strongly encouraged, to submit their data to one of a handful of online repositories.

Another repository, Neurosynth, created by Nichols, Poldrack, and others, has collected information from more than 10,000 fMRI articles—including studies with faulty methods—and uses computer algorithms to synthesize the data together, doing a meta-analysis of many studies on the same topic. Based on that analysis, “it’s clear you can get really reliable findings,” Poldrack said, adding, “At the very biggest picture, I don’t think we have to worry that everything is lost.”

The risks of ‘blobology’

But most research won’t be reanalyzed. The lack of replication and persistence of flawed methods are discouraging to some researchers.

Take those 3,500 studies potentially called into doubt by the PNAS paper. Nichols said they weren’t even making the worst mistakes. In his blog post, he estimated that 13,000 papers were making an even more basic error, the one originally identified in the dead salmon study seven years ago.

“In that way, we haven’t really matured, because people are still using approaches that have been repeatedly shown to be problematic,” said an fMRI researcher who blogs anonymously as Neuroskeptic for Discover Magazine.

Dr. Matthew Brett, an fMRI research scientist at the University of California, Berkeley, said in an email that he thinks there are too few consequences when scientists get their analysis wrong. “This is what I have come to think of as ‘throw it over the wall’ research, where it is always someone else’s problem whether a finding is correct.”

Others go further. This subset of neuroscience research is “completely broken,” said Jack Gallant, a psychology professor at UC Berkeley. “This particular problem”—highlighted in the PNAS study—“is just one aspect of the breakage, and it isn’t even the worst one,” he said in an email.

Gallant avoids the statistical methods outed by the PNAS paper, which are used for what some scientists refer to as “blobology,” or finding blobs in the brain that are active for specific tasks.

Instead, they develop other types of data analysis that make fewer assumptions. Gallant’s team’s latest work was a whole-brain map of responses to different types of language. There, Gallant made models of how the brain responds while hearing words, and used those to predict what the brain would do if it heard other words. Instead of revealing the blob of your brain for words about moths, their research attempts to show how your whole brain responds when it hears about any insects.

The root of the problem with most fMRI studies, Gallant said, has to do with how people interpret their statistics. “People treat statistical significance as if it was importance, when in fact the two concepts have nothing to do with one another,” he said.

Is the glass half-empty or half-full?

Within the fMRI community, these ideas are in the air. Recently, Poldrack, the Stanford scientist, addressed questions about the past flaws of fMRI research on his blog.

“I have been struggling with exactly these same issues myself, and my realizations about the shortcomings of our past approaches to fMRI analysis have shaken me deeply,” he wrote.

Poldrack told STAT he believes finding out that studies are crappy is a sign of scientific progress. And he pointed out fMRI’s clinical promise, citing the fact that, starting in 2007, surgeons could use fMRI to plan their surgeries, helping them avoid cutting out critical brain tissue.

“There’s a lot of people trying to do the right thing,” Poldrack said. “When we find out that what we thought was the right thing isn’t actually the right thing, they will hopefully change their practices to try to do things better.”

His Neurosynth database—the one with 10,000 papers—is another sign of the current achievements of fMRI. If you wanted to know if someone was experiencing pain, happiness, or fond memories, Poldrack could use their brain scan “and predict with pretty high accuracy which of those three things that person’s doing. Clearly you couldn’t do that if those 10,000 papers were all just crap,” he said.

In general, Poldrack said, there is reason both for optimism and for pessimism. Rather than thinking that the “glass is mostly empty, I try to be a little more balanced in thinking there’s air and water in the glass, and we have to figure out how to interpret the mixture thereof.”

Republished with permission from STAT. This article originally appeared on August 3, 2016