Ed Vul is a graduate student in the Department of Brain and Cognitive Sciences at the Massachusetts Intitute of Technology. He’s also the lead author of a recent paper, “Voodoo Correlations in Social Neuroscience,” which explored the high correlations between measures of personality or emotionality in the individual—such as the experience of fear, or the willingness to trust another person—with the activity of certain brain areas as observed in an fMRI machine. The paper has provoked a flurry of commentary. Mind Matters editor Jonah Lehrer chats with Vul about what this study means for the future of social neuroscience, whether the press is to blame and why we should always make multiple guesses.

LEHRER: What first got you interested in taking a critical look at fMRI papers in social neuroscience?

VUL: Some four years ago [University of California at San Diego neuroscientist] Hal Pashler and I saw a talk in which a very high correlation was reported between brain activity and the speed with which someone walked out of the room after the study.

Given what we knew about fMRI and the factors that determine how quickly we tend to walk in general, it seemed unbelievable to us that activity in this specific brain area could account for so much of the variance in walking speed. Especially so, because the fMRI activity was measured some two hours before the walking happened. So either activity in this area directly controlled motor action with a delay of two hours—something we found hard to believe—or there was something fishy going on. At that point, despite our suspicions, we didn't know exactly what that fishy thing was, so we put the topic aside.

A couple of years ago, I joined [M.I.T. neuroscientist] Nancy Kanwisher's lab and started working directly with fMRI data, and I learned the relevant jargon and statistics. At this point, [M.I.T. post-doc] Chris Baker and Nancy Kanwisher wrote a critique of a paper in Nature Neuroscience, which suffered from a non-independent analysis. After working through the general case myself (and writing a chapter on the topic), I realized how the correlations that seemed fishy to us so long ago were probably being produced, so we set out to investigate—ultimately leading to this paper.

LEHRER: What is a "voodoo correlation"?

VUL: We use that term as a humorous way to describe mysteriously high correlations produced by complicated statistical methods (which usually were never clearly described in the scientific papers we examined)—and which turn out unfortunately to yield some very misleading results. The specific issue we focus on, which is responsible for a great many mysterious correlations, is something we call “non-independent” testing and measurement of correlations. Basically, this involves inadvertently cherry-picking data and it results in inflated estimates of correlations.

To go into a bit more detail:

An fMRI scan produces lots of data: a 3-D picture of the head, which is divided into many little regions, called voxels. In a high-resolution fMRI scan, there will be hundreds of thousands of these voxels in the 3-D picture.

When researchers want to determine which parts of the brain are correlated with a certain aspect of behavior, they must somehow choose a subset of these thousands of voxels. One tempting strategy is to choose voxels that show a high correlation with this behavior. So far this strategy is fine.

The problem arises when researchers then go on to provide their readers with a quantative measure of the correlation magnitude measured just within the voxels they have pre-selected for having a high correlation. This two-step procedure is circular: it chooses voxels that have a high correlation, and then estimates a high average correlation. This practice inflates the correlation measurement because it selects those voxels that have benefited from chance, as well as any real underlying correlation, pushing up the numbers.

One can see closely analogous phenomena in many areas of life. Suppose we pick out the investment analysts whose stock picks for April 2005 did best for that month. These people will probably tend to have talent going for them, but they will also have had unusual luck (and some finance experts, such as Nassim Taleb, actually say the luck will probably be the bigger element). But even assuming they are more talented than average—as we suspect they would be—if we ask them to predict again, for some later month, we will invariably find that as a group, they cannot duplicate the performance they showed in April. The reason is that next time, luck will help some of them and hurt some of them—whereas in April, they all had luck on their side or they wouldn’t have gotten into the top group. So their average performance in April is an overestimate of their true ability—the performance they can be expected to duplicate on the average month.

It is exactly the same with fMRI data and voxels. If researchers select only highly correlated voxels, they select voxels that "got lucky," as well as having some underlying correlation. So if you take the correlations you used to pick out the voxels as a measure of the true correlation for these voxels, you will get a very misleading overestimate.
This, then, is what we think is at the root of the voodoo correlations: the analysis inadvertently capitalized on chance, resulting in inflated measurements of correlation. The tricky part, which I can’t go into here, was that investigators were actually trying to take account of the fact they were checking so many different brain areas—but their precautions made the problem that I am describing worse, not better!

LEHRER: Your paper has prompted a great deal of debate among social neuroscientists, and some of the scientists have issued a rebuttal of your paper. (You have since rebutted this rebuttal.) What do you hope this debate leads to? What methodological changes would you like to see adopted by social neuroscientists using fMRI?

VUL: The debate we have spurred is quite interesting to watch. At first some of the authors whose papers we criticized challenged our statistical point, but—for good reason--that line of argument doesn’t seem to have caught on. Right now, so far as I know, everyone seems to concede that the analysis used in these studies was not kosher, in the sense of providing correlation numbers that can be taken seriously. Instead, we are mostly hearing a couple of other arguments at this point.

One is that the correlation values themselves don’t really matter—it’s just the fact there is a correlation in a certain spot in the head that matters. I don’t agree with this observation at all, and we think the fact that many of these papers appeared in such high profile places is because editors were (justifiably) impressed with big effects. If one can account for, say, three quarters of individual differences in something important such as anxiety or empathy—obviously, that’s a real breakthrough, and it tells you not only where future research ought to look, but also where it shouldn’t. On the other hand, if it’s just 3 percent of the variance, that’s a whole lot less impressive, and may reflect much more indirect kinds of associations.

I have also heard some people complain that even if we are right on the mathematical point, we presented our argument in a bit of a rough-mannered way—criticizing particular articles and drawing unfavorable outside attention to the field, and using the humorous term “voodoo.”
We were as surprised as anyone by how much interest our paper sparked. Evidently it spread sort of “virally”—one neuroscientist we know said he got seven copies sent to him (none of them by us). The good side is that people are thinking harder now about how they do their analyses. The bad side is that all this publicity has left some authors feeling embarrassed and picked on. In our view, the statistical issues of independence and multiple comparisons are full of tricky pitfalls—we do not suggest that these were stupid mistakes people were making, and we regret hurting anyone’s feelings. I don’t think, however, that it would have made sense to write an article that did not “name names,” because if the scientific literature is to guide future research decisions, people have to know which results can be relied upon, and which cannot. (In fact, we suspect we only flagged a small fraction of the papers that have these problems, and some are in other fields, such as neurogenetics, cognitive neuroscience more broadly, and others.)

LEHRER: Do you think the media are partly responsible for sensationalizing the findings of social neuroscience? And how can the media do a better job of reporting on brain scanning data?

VUL: Social neuroscience is exciting! Who doesn't want to know why we feel love, jealousy or schadenfreude; how we decide to punish others; and why rejection hurts? So it doesn't take much to sensationalize findings in this field— most findings will already capture the imagination of the public, and they need just a slight push from the media.

In general, I would advocate a bit more skepticism on the part of reporters, with respect to all scientific findings. I think reporters generally try to write up conclusions in slightly grander terms than the scientists used originally. What they may not realize is that scientists themselves have often oversold the implications of their findings a bit. You put these things together and you can end up with really overblown coverage. (On the other hand, perhaps if this advice were followed, science columns would end up dull and unread, so perhaps I should withdraw the suggestion.).
When it comes to reporting on brain-scanning data in particular, I have noticed that the findings that seem to excite the public, and reporters, are mostly of the variety of "the brain does X," where X is some deeply human trait that we cherish (such as love, language, and so on). Perhaps this is still exciting to the layperson who is trying to hold on to some notion that the mind and the brain are different entities. I don't think there are many people studying neuroscience that find this particularly interesting, however. Most of us have been thoroughly convinced that the mind and brain are one thing. Perhaps if reporters focused on the questions of how something works in the brain, instead of that it works in the brain, they might pick up on a somewhat more (scientifically) exciting subset of the field.

LEHRER: What do you research when you're not thinking about voodoo correlations?

VUL: Lately I’ve been doing work at the interface of cognitive psychology and machine learning, asking how people might be carrying out rather difficult statistical computations (which we all seem to do unconsciously and automatically all the time).

I’ve been exploring the idea that the human mind is a “sampling engine,” basically, that it embodies complex statistical models, but can only make judgments about them by drawing samples. This would be equivalent to giving someone a bent coin: he doesn’t know the probability that it will land heads or tails, but he can flip it as many times as they like.
One experiment on this topic generated a bit of media excitement recently (I am quite pleased when the conclusions of my papers are overstated), which we called the “Wisdom of the Crowd Within.”

If you ask two people to make a guess about how many people live in New York, on average, the average of their two guesses will be better than either guess alone. This wisdom-of-crowds effect is a consequence of guesses from different people having independent errors.
We tested the hypothesis that even the average of two guesses from one individual would be more accurate than either guess alone. This would be the case if multiple responses from one individual are somewhat independent samples—like coin flips—from an internal probabilistic model. And indeed, that is exactly what we found. The average of two guesses from one person was (on average) better than either guess alone—and the improvement was even greater if the two guesses were separated by two weeks. Thus, we effectively have an evolving crowd within our own mind—and in some cases we can gain by consulting that crowd, rather than just making one instantaneous judgment.

Are you a scientist? Have you recently read a peer-reviewed paper that you want to write about? Then contact Mind Matters editor Jonah Lehrer, the science writer behind the blog The Frontal Cortex and the book Proust Was a Neuroscientist. His next book, How We Decide, will be available in February 2009.