A new computerized scan of the biomedical research literature has turned up tens of thousands of articles in which entire passages appear to have been lifted from other papers. Based on the study, researchers estimate that there may be as many as 200,000 duplicates among some 17 million papers in leading research database Medline.
The finding has already led one publication to retract a paper for being too similar to a prior article by another author.
Researchers Mounir Errami and Harold "Skip" Garner of the University of Texas Southwestern Medical Center at Dallas used a text-matching algorithm to compare seven million Medline abstracts against matching entries flagged by the database's software as being closely related.
The researchers set their own software tool, called eTBLAST, to identify pairs that were more than 45 percent identical, Errami says. The search turned up more than 70,000 hits, which the researchers and a team of three assistants have been manually checking. So far, Errami says they have gone through close to 3,000 pairs of abstracts or the full articles, if the duplicates have different authors. He notes that some matches were found to be innocent duplications, such as reprints or translations.
But in 79 cases (and counting), duplicates with different authors had no obviously legitimate explanation. The group has set up a public Web site, Déjà vu, to document the findings.
The next step in these cases of potential plagiarism, the researchers say, is for journals to investigate. In a Nature report, they advise other scientists "to withhold judgment of any candidate duplicates until evaluated by a suitable body such as an editorial board or a university ethics committee."
They note that most of the questionable duplicates inspected thus far appear to be papers submitted by the same authors to multiple journals, a less serious ethical lapse that allows researchers to artificially inflate their publication credits and give added weight to their work.
Errami and Garner estimate that perhaps 50,000 of the eTBLAST hits and 200,000 (0.01 percent) of the 17 million–plus Medline entries will turn out to be either plagiarized or multiple listings.
Prior studies have come up with different duplication rates. In a 2002 blind survey of 3,247 biomedical researchers by the University of Minnesota, 4.7 percent admitted that they had republished papers and 1.4 percent confessed to borrowing from others' work. A 2006 analysis of more than 280,000 papers in the physics preprint database arXiv, led by a U.S. computer scientist, found that 30,316 (10.5 percent) were suspected duplicates, and 677 (0.2 percent) were potentially plagiarized.
Action and Retraction
The U.T. Southwestern authors uncovered three cases in which their own colleagues may have been ripped off. Errami and Garner alerted the authors and journals involved, which they say has led to probes by the implicated publications.
One investigation has already led to a retraction: Journal publisher Elsevier is retracting a 2004 review paper (summarizing existing research) by rheumatologist Lee Simon of Harvard Medical School, says Shira Tabachnikoff, director of corporate relations at Elsevier. According to the Déjà vu entry, 55 percent of Simon's text, published in Best Practice & Research Clinical Rheumatology, closely matches that of a paper published a year earlier by U.T. Southwestern rheumatologist Roy Fleischmann in Expert Opinion on Drug Safety.
A review by SciAm.com of both articles confirmed that multiple consecutive pages of text in Simon's 32-page article were nearly identical to passages in Fleischmann's 19-page paper; of the 161 references listed in the later paper, nearly all were listed in the 2003 publication in the same nonalphabetical, nonchronological order.