Cover Image: August 2011 Scientific American Magazine See Inside

The Mind-Reading Salmon: The True Meaning of Statistical Significance















Share on Tumblr



Image:

If you want to convince the world that a fish can sense your emotions, only one statistical measure will suffice: the p-value.

The p-value is an all-purpose measure that scientists often use to determine whether or not an experimental result is “statistically significant.” Unfortunately, sometimes the test does not work as advertised, and researchers imbue an observation with great significance when in fact it might be a worthless fluke.

Say you’ve performed a scientific experiment testing a new heart attack drug against a placebo. At the end of the trial, you compare the two groups. Lo and behold, the patients who took the drug had fewer heart attacks than those who took the placebo. Success! The drug works!

Well, maybe not. There is a 50 percent chance that even if the drug is completely ineffective, patients taking it will do better than those taking the placebo. (After all, one group has to do better than the other; it’s a toss-up whether the drug group or placebo group will come up on top.)

The p-value puts a number on the effects of randomness. It is the probability of seeing a positive experimental outcome even if your hypothesis is wrong. A long-standing convention in many scientific fields is that any result with a p-value below 0.05 is deemed statistically significant. An arbitrary convention, it is often the wrong one. When you make a comparison of an ineffective drug to a placebo, you will typically get a statistically significant result one time out of 20. And if you make 20 such comparisons in a scientific paper, on average, you will get one signif­icant result with a p-value less than 0.05—even when the drug does not work.

Many scientific papers make 20 or 40 or even hundreds of comparisons. In such cases, researchers who do not adjust the standard p-value threshold of 0.05 are virtually guaranteed to find statistical significance in results that are meaningless statistical flukes. A study that ran in the February issue of the American Journal
of Clinical Nutrition tested dozens of compounds and concluded that those found in blueberries lower the risk of high blood pressure, with a p-value of 0.03. But the researchers looked at so many compounds and made so many comparisons (more than 50), that it was almost a sure thing that some of the p-values in the paper would be less than 0.05 just by chance.

The same applies to a well-publicized study that a team of neuroscientists once conducted on a salmon. When they presented the fish with pictures of people expressing emotions, regions of the salmon’s brain lit up. The result was statistically signif­icant with a p-value of less than 0.001; however, as the researchers argued, there are so many possible patterns that a statistically significant result was virtually guaranteed, so the result was totally worthless. p-value notwithstanding, there was no way that the fish could have reacted to human emotions. The salmon in the fMRI happened to be dead.



This article was originally published with the title The Mind-Reading Salmon.



Subscribe     Buy This Issue

Already a Digital subscriber? Sign-in Now
If your institution has site license access, enter here.

ABOUT THE AUTHOR(S)

Seife is a professor of journalism at New York University.


Rights & Permissions

16 Comments

Add Comment
View
  1. 1. racer79 10:45 AM 8/12/11

    "When they presented the fish with pictures of people expressing emotions, regions of the salmon’s brain lit up..." "...p-value notwithstanding, there was no way that the fish could have reacted to human emotions. The salmon in the fMRI happened to be dead."

    A dead salmon's brain lit up when shown pictures of people expressing emotion? but at the same time it couldn't have reacted to human emotion because it is dead? Someone please explain to me how this makes any sense. I mean, was it just a malfunction on the fMRI machine?

    Reply | Report Abuse | Link to this
  2. 2. billsmith 12:26 PM 8/12/11

    This isn't a malfunction of the fMRI machine, it's simply the way the machine (or any measuring tool) works. Measurements include noise. Nor is it a critique of the p-value for tests of statistical significance. It *is* a critique aimed at those who confuse statistical significance with practical significance.

    '"The goal of the salmon poster was to encourage the minority of researchers who report uncorrected statistics to move forward and begin using basic multiple comparisons correction in their research," says study leader Craig Bennett, a postdoctoral researcher in the Department of Psychology at the University of California, Santa Barbara.'

    Notice that his critique is not a broad generalized condemnation, but aimed at politely (by means of dry humor) reminding a small number of people exactly what they need to be doing.

    Look up "multiple comparisons" on Wikipedia, then read the poster in question, "Neural correlates of interspecies perspective-taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction"

    Reply | Report Abuse | Link to this
  3. 3. Yosarian 12:30 PM 8/12/11

    If I can remember correctly the study involving the salmon was actually designed to test the algorithms that fmri machines use to distinguish between significant data and insignificant. The end result of the research was to show that whe data setts are particularly large liklihood of false positives increases until it is certain hence the dead salmon with brain activity. Their is a limit to sensitivity of machines that use statistical logic to make calculations.

    Reply | Report Abuse | Link to this
  4. 4. bucketofsquid 05:25 PM 8/12/11

    But the dead salmon likes me! I shall name him Joe. :) He is my special (and stinky) friend.

    Reply | Report Abuse | Link to this
  5. 5. pdawg20 06:48 PM 8/12/11

    Understanding these concepts about hypothesis testing and p-values is something any elementary statistics course should teach and anyone coining science would be on top of. A further question might be "if the probability OS a success is 0 .05 and you run say 10 trials, what is the probability of getting at least on success?"
    Hopefully Scientific American does not think this I rocket science.

    Reply | Report Abuse | Link to this
  6. 6. pdawg20 06:48 PM 8/12/11

    Understanding these concepts about hypothesis testing and p-values is something any elementary statistics course should teach and anyone coining science would be on top of. A further question might be "if the probability OS a success is 0 .05 and you run say 10 trials, what is the probability of getting at least on success?"
    Hopefully Scientific American does not think this I rocket science.

    Reply | Report Abuse | Link to this
  7. 7. RSW 07:15 PM 8/12/11

    As I understand it statistical inference and p-values were intended to reduce as much "noise" as possible from the data in order to determine what might possibly be the controlling variable or variables of some dependent variable. The results were not to be taken as "god's-truth", but only a way of narrowing down the possible independent variables. The way statistical inference seems to be used today suggests that researcher's are taking their results as "god's truth" and going on to make other hypotheses based on them. This is most unfortunate.

    Reply | Report Abuse | Link to this
  8. 8. RSW 07:17 PM 8/12/11

    As I understand it statistical inference and p-values were intended to reduce as much "noise" as possible from the data in order to determine what might possibly be the controlling variable or variables of some dependent variable. The results were not to be taken as "god's-truth", but only a way of narrowing down the possible independent variables. The way statistical inference seems to be used today suggests that researchers are taking their results as "god's truth" and going on to make other hypotheses based on them. This is most unfortunate.

    Reply | Report Abuse | Link to this
  9. 9. rwstutler in reply to RSW 07:30 PM 8/12/11

    p value was introduced as a useful rule of thumb, but it has taken on a life of its own in the hands of technicians acting as if they were scientists (and in the hands of scientists who are too lazy to do the math). like the famous razor, a rule of thumb may be useful, some of the time, but in the wrong hands it becomes a tool for rationalization.

    Reply | Report Abuse | Link to this
  10. 10. nhvhr 04:09 AM 8/13/11

    Unfotunately, the definition of statistical significance in the article is incorrect. In the classical frequentist approach the p value is the probability of the observation given that the hypothesis is CORRECT. The idea is that if the probability of an observation is sufficiently low, then we can reject the hypothesis. A level of 0.05 (5%) is arbitrary and is the result of tradition. The concept of statistical significance is indeed complex.

    Reply | Report Abuse | Link to this
  11. 11. zstansfi in reply to nhvhr 03:41 PM 8/13/11

    The p-value estimates the probability of making a type I error, i.e. it is the probability of wrongly rejecting the null hypothesis. The definition made in this article mirrors this concept perfectly. Your definition on the other hand is clearly wrong.

    If we were to follow your logic, then a p value of .02 would imply that a given result has a 2% chance of being detected given that the hypothesis is correct. It seems to me that this would make hypothesis testing very difficult.

    Reply | Report Abuse | Link to this
  12. 12. nhvhr 07:01 PM 8/13/11

    No, this is exactly the idea of significance testing. If a result has the probability of 2% of being detected given that the hypothesis is correct, then we will reject the hypothesis if we are using a 5% significance level. On the other hand, if we are using a 1% significance level we cannot reject our hypothesis. Clearly, we have to select our significance level before doing the experiment. This is elementary statistics and can be found in any textbook of statistics.

    Reply | Report Abuse | Link to this
  13. 13. duncan7 10:19 AM 8/14/11

    This is why every intro stats class covers the Bonferroni correction, which corrects the desired p-value (0.05, by convention) against the creeping significance of multiple comparisons.

    Reply | Report Abuse | Link to this
  14. 14. blindboy 10:29 PM 8/14/11

    This places a significant responsibility on the editors of popular science magazines and web sites (such as this one). Most people do not read original papers they rely on scientifically literate journalists to state the significant results.
    We expect nonsense, propaganda and ignorance in the mass media (thank you Rupert!) but we should be able to rely on sophisticated analysis prior to publication in science media. I would be interested to hear if people think we get it.

    Reply | Report Abuse | Link to this
  15. 15. drjimnfl 10:53 PM 8/15/11

    Regarding the use of procedures such as the Bonferroni correction, it's as important to have a rationale for applying the correction as it is to have a rational for the selection of the Type I error criterion (alpha) for individual tests. For any given sample size, there is always a trade-off between the occurrence of Type I and Type II errors -- between one's ability to avoid false rejection of the null hypothesis and to avoid failure to report the existence of (probably) real effects. Consider the following quotation from the excellent book, Statistics as Principled Argument: "When there are multiple tests within the same study or series of studies, a stylistic issue is unavoidable. As Diaconis (1985) put it, 'Multiplicity is one of the most prominent difficulties with data-analytic procedures. Roughly speaking, if enough different statistics are computed, some of them will be sure to show structure' (p. 9). In other words, random patterns will seem to contain something systematic when scrutinized in many particular ways. If you look at enough boulders, there is bound to be one that looks like a sculpted human face. Knowing this, if you apply extremely strict criteria for what is to be recognized as an intentionally carved face, you might miss the whole show on Easter Island (Abelson, 1995, p. 70).

    Reply | Report Abuse | Link to this
  16. 16. iWind in reply to nhvhr 11:30 AM 8/19/11

    "If a result has the probability of 2% of being detected given that the hypothesis is correct, then we will reject the hypothesis if we are using a 5% significance level. On the other hand, if we are using a 1% significance level we cannot reject our hypothesis."

    Apparently you have completely misunderstood the basic concept of "probability!"

    Reply | Report Abuse | Link to this
Leave this field empty

Add a Comment

You must sign in or register as a ScientificAmerican.com member to submit a comment.
Click one of the buttons below to register using an existing Social Account.

More from Scientific American

See what we're tweeting about

Scientific American Editors

Tweets could not be retrieved at this time

Free Newsletters


Get the best from Scientific American in your inbox

Solve Innovation Challenges

Powered By: Innocentive

  SA Digital
  SA Digital

Science Jobs of the Week

Email this Article

The Mind-Reading Salmon: The True Meaning of Statistical Significance: Scientific American Magazine

X
Scientific American MIND iPad

Tap into your MIND

Get Both Print & Tablet Editions for one low price!

Subscribe Now >>

X

Please Log In

Forgot: Password

X

Account Linking

Welcome, . Do you have an existing ScientificAmerican.com account?

Yes, please link my existing account with for quick, secure access.



Forgot Password?

No, I would like to create a new account with my profile information.

Create Account
X

Report Abuse

Are you sure?

X

Institutional Access

It has been identified that the institution you are trying to access this article from has institutional site license access to Scientific American on nature.com. To access this article in its entirety through site license access, click below.

Site license access
X

Error

X

Share this Article

X