Although Vul is absolutely right that this would be a major error, he’s not describing what we actually do. Vul’s example assumes that the question that we are interested in is how the entire brain correlates with a personality measure or responds differently to two tasks. Staying with the grades examples, what social neuroscientists are really doing, however, is something closer to asking, “Across all colleges in the country, are there colleges where psychology grades are higher than sociology grades?” In other words, the question is not what the average difference is across all schools, but rather which schools show a difference. There is nothing inappropriate about asking this question or about describing the results found in those schools where a significant effect emerges.
With whole-brain analyses in fMRI, we’re doing the same thing. We are interested in where significant effects are occurring in the brain and when we find them we describe the results in terms of means, correlations, and so on. We are not cherry-picking regions and then claiming these represent the effects for the whole brain.
Vul et al. sent a survey around to the authors of about 50 papers to find out if authors were making the non-independence error, but they never told the authors what they were really interested in and the questions they sent did not actually assess the right information about the methods used in these studies. Based on the wrong information about the studies, they characterized half of the studies as making the non-independence error. I’ve been in touch with authors of almost all of the criticized studies and almost all of them have said something along the lines of. “Of course I didn’t use the method Vul et al. describe. Who would ever do that?”
And that’s the problem. Nobody does the analyses that Vul et al. are accusing us of. What we do is test thousands of spots in the brain (called “voxels”) to see if the differences in activity from one subject to the next reliably relates to differences on, say, a personality measure like neuroticism. This procedure is entirely valid. Then, a subset of the tests—those considered reliable enough that they would replicate—are reported in a table in the article (or in a figure or in the text). I suppose we could include a 200-page table and report the significance of every voxel in the brain, but everyone understands that we report the most significant activations and that the remaining regions have less significant results (this is the standard reporting procedure for scientific research). You have to remember that our goal is not to find the average effect in the brain, but to find where in the brain significant effects occur. The procedure we use is exactly the right one to do in order to answer that question.
LEHRER: They provide different sources of evidence (for example, simulations and analyses of published studies) to make their point. How compelling, in your opinion, is the evidence?
LIEBERMAN: Vul et al. provide some pieces of evidence for their argument that seem quite compelling at first blush, but that do not hold up under careful inspection. First, they include a simulation to show that correlations as high as 0.80 in fMRI data can be observed even when the true correlation in the population is zero. This [fact] is true of every study ever run by a behavioral scientist. There is always some probability, no matter how small, that the observed results could be due to chance alone. That’s what a p-value assesses.
The real question is how often will such large observed effects occur, when the true correlation is zero, under realistic conditions in fMRI studies. Vul et al. conducted their simulation assuming an fMRI sample size of 10 subjects, but fMRI studies rarely have sample sizes this small. Indeed, in their “meta-analysis” of social neuroscience fMRI studies, the average sample size was more than 18. In our reply, we simulated samples of 10, 15, 18 and 20 subjects, and examined how often correlations of 0.80 will be observed when there is no real effect. When the sample is 10, at least one large spurious correlation is likely to happen in a large percentage of simulated studies. When the sample size is increased to 18 subjects, however, there are spurious correlations in only a small percentage of simulated studies. So spurious correlations can occur, but they will be rare in typical fMRI studies.