A new computerized scan of the biomedical research literature has turned up tens of thousands of articles in which entire passages appear to have been lifted from other papers. Based on the study, researchers estimate that there may be as many as 200,000 duplicates among some 17 million papers in leading research database Medline.

The finding has already led one publication to retract a paper for being too similar to a prior article by another author.

Researchers Mounir Errami and Harold "Skip" Garner of the University of Texas Southwestern Medical Center at Dallas used a text-matching algorithm to compare seven million Medline abstracts against matching entries flagged by the database's software as being closely related.

The researchers set their own software tool, called eTBLAST, to identify pairs that were more than 45 percent identical, Errami says. The search turned up more than 70,000 hits, which the researchers and a team of three assistants have been manually checking. So far, Errami says they have gone through close to 3,000 pairs of abstracts or the full articles, if the duplicates have different authors. He notes that some matches were found to be innocent duplications, such as reprints or translations.

But in 79 cases (and counting), duplicates with different authors had no obviously legitimate explanation. The group has set up a public Web site, Déjà vu, to document the findings.

The next step in these cases of potential plagiarism, the researchers say, is for journals to investigate. In a Nature report, they advise other scientists "to withhold judgment of any candidate duplicates until evaluated by a suitable body such as an editorial board or a university ethics committee."

They note that most of the questionable duplicates inspected thus far appear to be papers submitted by the same authors to multiple journals, a less serious ethical lapse that allows researchers to artificially inflate their publication credits and give added weight to their work.

Errami and Garner estimate that perhaps 50,000 of the eTBLAST hits and 200,000 (0.01 percent) of the 17 million–plus Medline entries will turn out to be either plagiarized or multiple listings.

Prior studies have come up with different duplication rates. In a 2002 blind survey of 3,247 biomedical researchers by the University of Minnesota, 4.7 percent admitted that they had republished papers and 1.4 percent confessed to borrowing from others' work. A 2006 analysis of more than 280,000 papers in the physics preprint database arXiv, led by a U.S. computer scientist, found that 30,316 (10.5 percent) were suspected duplicates, and 677 (0.2 percent) were potentially plagiarized.

Action and Retraction

The U.T. Southwestern authors uncovered three cases in which their own colleagues may have been ripped off. Errami and Garner alerted the authors and journals involved, which they say has led to probes by the implicated publications.

One investigation has already led to a retraction: Journal publisher Elsevier is retracting a 2004 review paper (summarizing existing research) by rheumatologist Lee Simon of Harvard Medical School, says Shira Tabachnikoff, director of corporate relations at Elsevier. According to the Déjà vu entry, 55 percent of Simon's text, published in Best Practice & Research Clinical Rheumatology, closely matches that of a paper published a year earlier by U.T. Southwestern rheumatologist Roy Fleischmann in Expert Opinion on Drug Safety.

A review by SciAm.com of both articles confirmed that multiple consecutive pages of text in Simon's 32-page article were nearly identical to passages in Fleischmann's 19-page paper; of the 161 references listed in the later paper, nearly all were listed in the 2003 publication in the same nonalphabetical, nonchronological order.

In a telephone interview before the retraction Fleischmann stopped short of accusing Simon of plagiarism, pending Elsevier's decision, but acknowledged that the similarities were suspicious to say the least. "It's word for word, comma for comma, period for period, sentence for sentence, paragraph for paragraph, for the bulk of the article," he says.

Simon, who admits that he reviewed Fleischmann's paper prior to its publication, defends his own article by noting that there are only so many ways for two authors to summarize the same body of research. "This wasn't intentional duplication," he told SciAm.com in a telephone interview. "This is what happens when you do review articles."

He added that he was being singled out for a paper that was a chore to write and brought him no added prestige. "Who cares? This is a review article," he says. "I'm never going to write another one, because of this bullshit."

Will Duplicates Keep Multiplying?

Errami and Garner say they hope that the prospect of being found out will discourage would-be copycats.

But Mike Rossner, executive director of journal publisher The Rockefeller University Press, notes that eTBLAST or similar search schemes may not be successful barriers against republication, because manuscripts submitted simultaneously to two journals would not turn up in databases until after they had been published.

Maxine Clarke, publishing executive editor of the journal Nature, says her publication uses text-matching software to compare a submission with papers in the publishing group's many specialty journals. She notes that they also ask prospective authors to submit copies of preprints and related manuscripts submitted to other journals to help editors and reviewers assess their novelty. Bronwen Dekker, an assistant editor at Nature Protocols, says her journal uses eTBLAST to scan submissions for evidence of self-plagiarism (copying one's past work) in the abstract or introduction.

Some evidence suggests that the possibility of detection may not deter the unscrupulous. Rossner says that five years ago, The Rockefeller University Press began checking papers for manipulation of photos that depicted experimental data, but he says he has seen no decline in the number of doctored images.

Although the long-term effect of the finding remains to be seen, there has already been some fallout. To wit: Fleischmann says that he has known Simon for 25 years and considered him a friend, but adds "I don't know if we still are."