Matthew Lieberman is associate professor of social neuroscience at the University of California, Los Angeles.  In recent weeks, he’s also rebutted the claims of a recent paper, “Voodoo Correlations in Social Neuroscience,” which explored the high correlations between measures of personality or emotionality in the individual—such as the experience of fear, or the willingness to trust another person—with the activity of certain brain areas as observed in an fMRI machine. Mind Matters editor Jonah Lehrer chats with Lieberman about why most fMRI correlations aren’t false, the “reward” of intense grief and why accepting unfair offers seems to activate brain areas involved with self-control.

LEHRER: Your field of research has come under fire in a recent paper titled "Voodoo Correlations in Social Neuroscience."  What's the authors' argument and have they identified a significant problem in this field?

LIEBERMAN: In their paper, Vul and colleagues suggest that brain-personality correlations in many social neuroscience studies depend on invalid methods and thus are “implausibly high,” “likely…spurious” and “should not be believed.” These claims are incorrect. These analyses use standard procedures for drawing inferences and protecting against false positives. The correlation estimates will tend to be somewhat higher than the true value, but there is no evidence to suggest that these correlations are meaningless or “voodoo” science.

The argument that Vul and colleagues put forward in their paper is that correlations observed in social neuroscience papers are impossibly high. There’s a metric (the product of the reliabilities of the two variables) that determines just how high of a correlation can be observed between two variables. They suggest that because, on average, this metric allows correlations as high as 0.74, that social neuroscientists should never see correlations higher than that.

Given the gravity of the claim, it’s important to get this [figure] right, but they do not. Here’s their mistake: it’s not the average of this metric that determines what can be observed in a study, but rather the metric for that particular study or at the very least, the metric estimated from prior use of the actual measures in that study. Just because the average price of groceries in a supermarket is $3 does not mean you cannot find a $12 item. In fact, a study that I’m an author on (and is a major target in the Vul et al. paper) is a perfect example. The reliability of the self-report measure in our study is far higher than the average they report allowing for higher observed correlations. They knew this [fact], but presented our study as violating the “theoretical upper bound” anyway.

Their second major conceptual point is that numerous social neuroscience authors were making a non-independence error. Ed Vul gives a nice example of what he means by the non-independence error in a chapter with [Massachusetts Institute of Technology neuroscientist] Nancy Kanwisher. They suggest that we might be interested in whether a psychology or a sociology course is harder and assess this [question] by comparing the grades of students who took both courses. In a comparison of all students, we find no difference in scores. But what if we began by selecting only students who scored higher in psychology than sociology and then statistically compared those? If we used the results of that analysis to draw a general inference about the two courses, this [strategy] would be a non-independence error, because the selection of the sample to test is not independent of the criterion being tested. This [practice] would massively bias the results. 

Although Vul is absolutely right that this would be a major error, he’s not describing what we actually do. Vul’s example assumes that the question that we are interested in is how the entire brain correlates with a personality measure or responds differently to two tasks. Staying with the grades examples, what social neuroscientists are really doing, however, is something closer to asking, “Across all colleges in the country, are there colleges where psychology grades are higher than sociology grades?” In other words, the question is not what the average difference is across all schools, but rather which schools show a difference. There is nothing inappropriate about asking this question or about describing the results found in those schools where a significant effect emerges. 

With whole-brain analyses in fMRI, we’re doing the same thing. We are interested in where significant effects are occurring in the brain and when we find them we describe the results in terms of means, correlations, and so on. We are not cherry-picking regions and then claiming these represent the effects for the whole brain.

Vul et al. sent a survey around to the authors of about 50 papers to find out if authors were making the non-independence error, but they never told the authors what they were really interested in and the questions they sent did not actually assess the right information about the methods used in these studies. Based on the wrong information about the studies, they characterized half of the studies as making the non-independence error. I’ve been in touch with authors of almost all of the criticized studies and almost all of them have said something along the lines of. “Of course I didn’t use the method Vul et al. describe. Who would ever do that?”

And that’s the problem. Nobody does the analyses that Vul et al. are accusing us of. What we do is test thousands of spots in the brain (called “voxels”) to see if the differences in activity from one subject to the next reliably relates to differences on, say, a personality measure like neuroticism. This procedure is entirely valid. Then, a subset of the tests—those considered reliable enough that they would replicate—are reported in a table in the article (or in a figure or in the text). I suppose we could include a 200-page table and report the significance of every voxel in the brain, but everyone understands that we report the most significant activations and that the remaining regions have less significant results (this is the standard reporting procedure for scientific research). You have to remember that our goal is not to find the average effect in the brain, but to find where in the brain significant effects occur. The procedure we use is exactly the right one to do in order to answer that question.

LEHRER: They provide different sources of evidence (for example, simulations and analyses of published studies) to make their point. How compelling, in your opinion, is the evidence?

LIEBERMAN: Vul et al. provide some pieces of evidence for their argument that seem quite compelling at first blush, but that do not hold up under careful inspection. First, they include a simulation to show that correlations as high as 0.80 in fMRI data can be observed even when the true correlation in the population is zero. This [fact] is true of every study ever run by a behavioral scientist. There is always some probability, no matter how small, that the observed results could be due to chance alone. That’s what a p-value assesses.

The real question is how often will such large observed effects occur, when the true correlation is zero, under realistic conditions in fMRI studies. Vul et al. conducted their simulation assuming an fMRI sample size of 10 subjects, but fMRI studies rarely have sample sizes this small. Indeed, in their “meta-analysis” of social neuroscience fMRI studies, the average sample size was more than 18. In our reply, we simulated samples of 10, 15, 18 and 20 subjects, and examined how often correlations of 0.80 will be observed when there is no real effect. When the sample is 10, at least one large spurious correlation is likely to happen in a large percentage of simulated studies. When the sample size is increased to 18 subjects, however, there are spurious correlations in only a small percentage of simulated studies. So spurious correlations can occur, but they will be rare in typical fMRI studies.

It is also important to remember that if a correlation is spurious, its spatial location in the brain should be random, but our correlational effects are not random. Studies of empathy for pain, fear of pain and the social pain of being rejected all show correlations between self-report measures and activity in the dorsal anterior cingulate cortex. This region is the same one that is surgically lesioned to treat intractable chronic pain—hardly random.

In your interview with Ed Vul, I see that he suggests that even if these effects aren’t entirely spurious, they may only account for a relatively small percentage of the variance and thus aren’t that scientifically interesting. First of all, that’s a major admission right there to go from arguing these are “spurious and invalid” to admitting they are “probably valid, but modest.”

Second of all, there are people who would otherwise be dead if we adopted Vul’s opinions regarding the importance of small effects. The biggest study to examine the effects of aspirin on heart attacks was stopped midway through the study because the experimenters looked at the data and realized it was unethical to prevent subjects in the placebo control from taking aspirin. Significantly more people had died from heart attacks in the placebo condition than the aspirin condition, and yet the experimental manipulation (aspirin versus placebo) in this study accounted for less than 1 percent of the variance in the outcomes.

So is there likely to be some inflation in the r-values obtained in whole-brain correlational analyses? Sure, but we’ve known this for a long time and most studies are interested in identifying where in the brain meaningful relations are occurring rather than estimating their exact magnitude. Are the reported correlations egregiously inflated? Based on the sample of studies that Vul et al. survey, probably not. Is an invalid method being used to test whether meaningful correlations are present and therefore worthy of the label “voodoo”? No way.

LEHRER: Do you think this controversy has had any positive benefits for the field, even though you strongly disagree with its findings?

LIEBERMAN: The answer is yes, but it’s worth taking a moment to discuss the potential harm as well. Despite the fact that Vul et al.’s novel claims (impossibly high correlations, invalid methods) are demonstrably false, these claims have the potential to bring great harm to the field. There are people at funding agencies and top journals wondering whether they should continue to support this kind of work. And this [effect] doesn’t just involve social neuroscience either, because anyone reading their paper can recognize that the issues Vul et al. raise, albeit incorrectly, apply equally to all areas of cognitive neuroscience. So even if people in the field recognize the limitations of the Vul et al. argument, it may be a challenge to regain the trust of those we count on to support our work. It’s a well-known social psychological fact that when someone is cleared of a crime, the lasting association is between the person and the crime, rather than the fact that they were cleared.

On to the good news. I think this is getting lots of people to think more carefully about many different kinds of analyses. For instance, many of the “independent” correlations that Vul et al. approve of have a source of bias (restriction of range) that causes them to under-estimate the true correlation value. There’s a statistical correction for this and we’ve included it in our reply. Additionally, the results of the simulation we ran in our reply was illuminating for us. Based on how we and other social neuroscientists typically analyze data, this simulation suggests that we should really be aiming for samples of at least 18 subjects because at this size, there was a dramatic drop in the number of false positives (for example, finding a correlation of r=0.80 when no true correlation existed).  Of course, we would always like to run larger samples but the expense of imaging is tremendous.

LEHRER: You've studied some of the brain differences underlying different types of grief, and found that, paradoxically, extreme grief is actually characterized by the activation of "reward centers" in the brain. Could you explain the data? And what can this teach us about the nature of grief?

LIEBERMAN: In this study, headed by Mary-Frances O’Connor [of the University of California at Los Angeles], we looked at two groups of women whose mothers or sisters had died from breast cancer in the past few years. One group had gone through the grief process and recovered relatively normally (normal grief) and the other group was diagnosed with “complicated grief,” meaning that they were not recovering with the passage of time.  Both groups were shown images and words meant to remind them of the deceased (along with control images and words that did not). Both groups showed activity in the pain network that we previously observed during social exclusion. 

When we compared the two groups with one another, however, there was no difference in the activity in the pain network. Instead, we observed greater activity in a reward region (ventral striatum) in the complicated grief subjects relative to the normal grief subjects. This is the first study to find this effect so any interpretation is preliminary. Nevertheless, this activation may reflect something like a craving for connection with the deceased much like the craving for a drug in an addicted individual. To this end, we found that the activity in this same “reward” region was significantly associated with the extent to which subjects told us they were yearning for the deceased. It also reminds us that activity in the reward regions of the brain may not always signal greater well-being for an individual. Indeed, from a Buddhist perspective, these attachments and cravings gets us into just as much trouble as the more obviously negative events we all try to avoid.

LEHRER: You've found that the ability to accept unfair offers requires the activation of cortical areas typically associated with self-control. What can this teach us about our propensity for fairness?

We don’t know if accepting unfair offers “requires” lateral prefrontal activations, but we did see these activations when people did accept a certain kind of unfair offer. [Psychologist] Alan Sanfey [of Princeton University] and colleagues published the first fMRI study of the ultimatum game in 2003. In this game, the proposer decides how to split $10 between himself and the responder, and the responder decides whether to accept or not. The interesting part of the game is that if the responder says no, both players get nothing. When the proposer offers $5, an even split, responders nearly always accept, but when the proposer offers $1 or $2, the responder will often decline—even when the proposer and responder will never play the game again making reputational concerns irrelevant. Sanfey found that responders receiving unfair offers of $1 and $2 out of $10 had greater insula activity, a limbic region associated with pain and visceral distress. Moreover, greater activity in this region was associated with a greater tendency to reject the “unfair” offer.

We conducted a similar study in my lab, headed by Golnaz Tabibnia, in which we looked at two things. First, is there an observable effect of being treated fairly, above and beyond the higher monetary payouts associated with fairness? We looked at this by comparing offers such as $5 out of $10 to offers such as $5 out of $23. In both cases, you can earn $5, but the first offer is much fairer than the second. We found that when we took the monetary aspect out of the picture in this way, we still saw activity throughout the brain’s reward network, associated with being treated fairly. This [finding] is consistent with a number of recent studies showing that positive social treatment from others activates reward regions.

Because we included offers such as $5 out of $23, we could also look at something new that Sanfey could not. This offer is unfair, but to the typical college undergraduate it is a financially desirable proposition ($5 is the maximum reward in most neuroscience ultimatum game studies). Rejecting $1 out of $10 is easy, but $5 out of $23 is a different story. There are at least two things that might be going on when people accept these unfair but desirable offers. On the one hand, the pure reward potential might be highly motivating. In this case, one would expect to see more activity in reward regions like the ventral striatum when accepting $5 out of $23 compared to when they reject these offers. We did not see any evidence for that.

Alternatively, this could be a case of self-control, in which people may say to themselves “that may be an insulting offer, but if I resist the temptation to get revenge, I can leave with more money.” Although our data can’t establish whether subjects actually thought something like that, the data were certainly consistent with that kind of process. When people accepted these offers, we saw increased activity in right inferior prefrontal cortex (a region involved in various forms of self-control), decreased activity in the insula, and an inverse relationship between the regions such that greater prefrontal activity was associated with diminished insula activity. This [finding] is consistent with the idea that people are regulating their distress in order to achieve the long-term financial benefit. One interesting implication of this is that it suggests our impulse is to reject unfair treatment rather than to get the money and that it is our cognitive abilities that lead us to accept, rather than to reject, the unfair treatment and take the money.

Are you a scientist? Have you recently read a peer-reviewed paper that you want to write about? Then contact Mind Matters editor Jonah Lehrer, the science writer behind the blog The Frontal Cortex and the book Proust Was a Neuroscientist. His latest book is How We Decide.