January 30, 2008 | 6 comments

Scan Uncovers Thousands of Copycat Scientific Articles

Database search turns up research papers suspiciously similar to prior publications, prompting investigations

By JR Minkel   

 
e-mail print comment

A new computerized scan of the biomedical research literature has turned up tens of thousands of articles in which entire passages appear to have been lifted from other papers. Based on the study, researchers estimate that there may be as many as 200,000 duplicates among some 17 million papers in leading research database Medline.

The finding has already led one publication to retract a paper for being too similar to a prior article by another author.

Researchers Mounir Errami and Harold "Skip" Garner of the University of Texas Southwestern Medical Center at Dallas used a text-matching algorithm to compare seven million Medline abstracts against matching entries flagged by the database's software as being closely related.

The researchers set their own software tool, called eTBLAST, to identify pairs that were more than 45 percent identical, Errami says. The search turned up more than 70,000 hits, which the researchers and a team of three assistants have been manually checking. So far, Errami says they have gone through close to 3,000 pairs of abstracts or the full articles, if the duplicates have different authors. He notes that some matches were found to be innocent duplications, such as reprints or translations.

But in 79 cases (and counting), duplicates with different authors had no obviously legitimate explanation. The group has set up a public Web site, Déjà vu, to document the findings.

The next step in these cases of potential plagiarism, the researchers say, is for journals to investigate. In a Nature report, they advise other scientists "to withhold judgment of any candidate duplicates until evaluated by a suitable body such as an editorial board or a university ethics committee."

They note that most of the questionable duplicates inspected thus far appear to be papers submitted by the same authors to multiple journals, a less serious ethical lapse that allows researchers to artificially inflate their publication credits and give added weight to their work.

Errami and Garner estimate that perhaps 50,000 of the eTBLAST hits and 200,000 (0.01 percent) of the 17 million–plus Medline entries will turn out to be either plagiarized or multiple listings.

Prior studies have come up with different duplication rates. In a 2002 blind survey of 3,247 biomedical researchers by the University of Minnesota, 4.7 percent admitted that they had republished papers and 1.4 percent confessed to borrowing from others' work. A 2006 analysis of more than 280,000 papers in the physics preprint database arXiv, led by a U.S. computer scientist, found that 30,316 (10.5 percent) were suspected duplicates, and 677 (0.2 percent) were potentially plagiarized.

Action and Retraction

The U.T. Southwestern authors uncovered three cases in which their own colleagues may have been ripped off. Errami and Garner alerted the authors and journals involved, which they say has led to probes by the implicated publications.

One investigation has already led to a retraction: Journal publisher Elsevier is retracting a 2004 review paper (summarizing existing research) by rheumatologist Lee Simon of Harvard Medical School, says Shira Tabachnikoff, director of corporate relations at Elsevier. According to the Déjà vu entry, 55 percent of Simon's text, published in Best Practice & Research Clinical Rheumatology, closely matches that of a paper published a year earlier by U.T. Southwestern rheumatologist Roy Fleischmann in Expert Opinion on Drug Safety.

A review by SciAm.com of both articles confirmed that multiple consecutive pages of text in Simon's 32-page article were nearly identical to passages in Fleischmann's 19-page paper; of the 161 references listed in the later paper, nearly all were listed in the 2003 publication in the same nonalphabetical, nonchronological order.



Read Comments (6) | Post a comment 1 2 Next >


Share
Propeller    Digg!  Reddit delicious  Fark 
Slashdot    RT @sciam Scan Uncovers Thousands of Copycat Scientific ArticlesTwitter Review it on NewsTrust 
sharebar end

You Might Also Like


Discuss This Article


Click here to submit your comment.

VIEW:

2,573 characters remaining
 
  Email me when someone responds to this discussion.
 

risk free issue 

Sciam - cover Email:
Name:
Address:
Address 2:
City:
State:  
spacer




Editor's Pick

  • Adapting to the Freshwater CrisisForward-thinking experts are getting a better handle on the growing global water shortage and coming up with innovative approaches to ensuring the security, safety and sustainability of this resource

Newsletter

Society & Policy Newsletter

Get weekly coverage delivered to your inbox


 Podcasts

  • 60-Second Earth     RSS  · iTunes The Jellyfish Menace
    click to enable

    Download

  • 60-Second Science     RSS  · iTunes Plants Share Light If Neighbor Is Related
    click to enable

    Download





ADVERTISEMENT
 
 


Also on Scientific American


© 1996-2009 Scientific American Inc. All Rights Reserved. Reproduction in whole or in part without permission is prohibited.
ADVERTISEMENT