Advertisement

Study finds plenty of apparent plagiarism
Data mining reveals too many similarities between papers
Web edition : Thursday, March 5th, 2009
Text Size
access
IS THIS PLAGIARISM?Yellow highlights aspects of this paper that copy material published in a previous paper — by other authors.UT Southwestern Medical Center

If copying is the sincerest form of flattery, then journals are publishing a lot of amazingly flattering science. Of course to most of us, the authors of such reports would best be labeled plagiarists — and warrant censure, not praise.

But Harold R. Garner and his colleagues at the University of Texas Southwestern Medical Center at Dallas aren’t calling anybody names. They’re just posting a large and growing bunch of research papers — pairs of them — onto the Internet and highlighting patches in each that are identical.

Says Garner: “We’re pointing out possible plagiarism. You be the judge.” But this physicist notes that in terms of wrong-doing, authors of the newest paper in most pairs certainly appear to have been “caught with their hands in the cookie jar.”

Garner's team developed data-mining software about eight years ago that allows a resarcher to input lots of text — the entire abstract of a paper, for instance — and ask the program to compare it to everything posted on a database. Such as the National Library of Medicine's MEDLINE, which abstracts all major biomedical journal articles. The software then looks for matches to words, phrases, numbers — anything, and pulls up matches that are similar. The idea: to help scientists find papers that offer similar findings, contradictions, even speculations that might suggest promising new directions in a given research field.

Early on, Garner says, his team realized this software also had the potential for highlighting potential plagiarism. But that was not their first priority. In fact, his group didn't really begin looking in earnest for signs of copycatting until about two years ago.

Today, Garner’s group has published a short paper in Science on results of a survey it conducted among authors of pairs of remarkably similar papers (identified from MEDLINE), and the editors who published those papers. The Texas team wanted to find out whether the apparent copycats — not only the authors but also the editors who published their work — would own up to plagiarism. And once confronted with this public finger pointing, what would they do about it?

The real surprise, says Garner — indeed, “the shock” — was that so few authors of the initial papers were aware of the copycat’s antics. Prior to emailing PDFs that highlighted identical passages in each set of paired papers, 93 percent said they had been unaware of the newer paper.

Since those newer papers were all available via MEDLINE searches, they should have come up every time authors of the first paper searched for work on topics related to their own. In fact, Garner points out, because MEDLINE posts search results in reverse chronological order, copycatted papers should turn up before the papers on which they had been based.

To date, 83 of the 212 pairs of largely identical papers identified so far by the data-mining software that Garner’s team has developed have triggered formal investigations by the journals involved. In 46 instances, editors of the second papers have issued retractions. However, what constitutes a retraction varied considerably. It might have been broad publication of problems with the offending second paper — both in the journal and in a notice sent to MEDLINE.

Other times, some website might have acknowledged the retraction of some or all of a paper, with no notification of the problem forwarded to MEDLINE. In such cases, Garner notes, anyone using MEDLINE's search function would get no warning that the abstract it pulled up relates to findings that have been discredited.

Have you ever shared this material on apparent plagiarism with the administrators of the second paper's authors, I asked Garner. "No, that would have put us into this situation where we would be acting more as police or an investigatory body," he said. And they're not anxious to serve as honesty cops.

Too bad.

So far, his team's software has turned up more than 9,000 'highly similar' papers in biomedical journals indexed by MEDLINE. And only 212 are copycats? Actually, Garner says, that estimate is probably way low. Of that big number, "We have only gotten through looking at 212 so far." Their investigations continue.

For more on the implications of such copycatting, check out my next post.


Found in: Biomedicine, Body & Brain, Genes & Cells and Science & Society

Comments 16

Please alert Science News to any inappropriate posts by clicking the REPORT SPAM link within the post. Comments will be reviewed before posting.

  • I teach physics in high school and we attempt to teach some ethics in writing here, and use SafeAssign to check for plagiarism. Perhaps all scientific journals should do the same.

    Not only is this plagiarism unethical, it biases science to certain positions; the plagiarized information will become the defacto standard due to repetition and new ideas may find it more and more difficult to make headway as a result.

    This is not a trivial problem for science, medical and otherwise. If we cannot rely on the basic honesty and ethics of researchers, eventually science will crumble, and the public will disdain the field.

    It is essential that steps be take to ensure ethical behaviour in publishing and presenting scientific work, and the program that Garner et al have developed would be the place to start. All the journals should subscribe as a means of guaranteeing to their subscribers that they have high standards of originality and honesty.
    Gary Allan Gary Allan
    Mar. 6, 2009 at 5:13am
  • I do not know what constitutes plagiarism: "The software then looks for matches to words, phrases, numbers — anything, and pulls up matches that are similar". While I do not doubt that plagiarism occurs, I wonder to what extent this data mining takes into account redundant descriptions of techniques/methods? Similarly, if a given researcher copies his own Methods section from a previous paper in a subsequent paper does that constitute plagiarism?
    Jeff Yau Jeff Yau
    Mar. 6, 2009 at 9:35am
  • The Editor-in-Chief always, according to the statements made by the publishers, must retract plagiarised papers. The Editor must maintain the integrity of scientific record. The publisher and the journal have no right to present fraud as science. Once a paper shows fraud and falsification of the fact of authorship - it must be retracted, period. But, there is still a lot of dishonesty on this level. I now have to fight hard for retraction of an obviously plagiarised paper in which my PhD research has been stolen, see [Link was removed]
    Michael Pyshnov
    Michael Pyshnov Michael Pyshnov
    Mar. 6, 2009 at 11:15am
  • An worse problem in science, is fictional papers that are not plagiarized. There are far more of them than most believe, some of the labs they come from are among the most respected institutions in the world. These papers are published, with falsified data supporting questionable conclusions to keep the huge R01 grants rolling in. Since the heads of those labs (some of them highly influential chairs) then become the evaluators of grants in a mutual back-scratching society, this kind of fake "science" obstructs real progress. Sometimes, it is desperate grad students and post-docs who initiate and carry out such fraud. But more often, it is a climate of "wink-wink-nudge-nudge" set by the head of the lab that makes such things acceptable. My anecdotal experience is that it is worse the closer you get to medical research, and worst in animal model research.
    John Toradze John Toradze
    Mar. 6, 2009 at 11:46am
  • Well, the image attached to the post is entirely unreadable, but from what I read, they're just using the abstracts. Anyone who's ever actually written a scientific abstract knows abstracts aren't really supposed to be unique pieces of prose. The narrative in a paper serves only to make the relevance of the data more clear. Grants aren't awarded based on the originality or literary quality of the narrative describing the data, so copying of good writing style should be encouraged. Considering some of the tortured constructions I've seen, I wish more people did this. If verbatim copying of a sentence from a better writer than you makes the relevance of your data to the field more clear, you should do it. Copying of data, not narrative, would be the real crime, but in order for you to do that, it would have to be already published data, and no one makes a career out of publishing things second.
    William Gunn William Gunn
    Mar. 7, 2009 at 1:50pm
  • I too, am puzzled by what counts as plagiarism. Are six identically presented words in a row enough to call plagiarism? Are seven words too many? What if a series of 50 words have 48 in common, but only 10 words are in the identical original order? Can you have plagiarism when there is no exact replication? Does one identical paragraph count; do ten identical paragraphs cement the claim? What exactly is a paragraph?

    When someone finds an interesting fact but has a more interesting interpretation than that provided by the original author, does the reiteration of the original person's data without giving that person credit, and then providing a new interpretation of that data; is that by definition plagiarism?

    If a person took parts of six different researcher's papers, and without giving credit to each, used parts of each to create a paper in support of a new theory in a completely unrelated field of knowledge, would that be plagiarism x6, or would it be plagiarism at all?

    This is a lot more complicated than it first appears.
    Phil Grimm Phil Grimm
    Mar. 8, 2009 at 10:31am
  • Plagiarism does not have to be just stealing someone's words, but also their ideas. As a college student I have had that constantly drilled into my head. All papers need citations of where your ideas came from. As for previous work by authors, I've noticed in many of my textbooks, if an author has done previous research on a topic which he is using to back up his argument in the book, he does indeed cite his previous work. I'm not sure if it is to avoid plagiarism, but I do know it helps those interested in the topic find more information on that topic.
    Carol Richards Carol Richards
    Mar. 8, 2009 at 12:44pm
  • 1) Gunn asks about whether the matching reported in the new Science paper is only based on the abstracts. No. That's merely the point of departure. For Garner's program, if and only when abstracts show substantial similarity does his team then go on to probe the rest of the papers scouting for additional similarities. Bottom line: Make sure your abstract is very different from the original paper and the system may not catch virtually verbatim text throughout the rest of the paper.
    2) As Grimm notes, identifying plagiarism can be tricky. That's one reason the Texas group eliminates papers with common author(s). So it's not like a research team is updating an earlier analysis using the same text on methods and some tables or citations. Such similarities between papers from the same study (and authors) would be considered lazy writing, but not plagiarism.
    3) Most damning: Garner asked researchers about the similarities in their paper and an earlier one. And many queried authors flat-out acknowledged that yes, they borrowed the earlier work by others. Some had the audacity to ask: Is that a bad thing? Yikes. One author actually argued that his copying was an "unconscious" joke. I have to ask: Isn't that an oxymoron? Moreover, this "prankster" was found to have played this same type of joke in seven other instances. I'm sorry, but that's NOT funny...
    jar jar
    Mar. 8, 2009 at 2:44pm
  • PS: Gunn notes that the image in this blog is unreadable. And of course it is, owing to its size. But the paper is searchable on the Texas team's deja vu sebsite, for which there is a link at right in the margin, above. And there you can read this paper and hundreds more. One section of the website even shows the first and followup papers in each pair displayed side by side, with matching segments highlighted in blue. It's an eye opener.
    jar jar
    Mar. 8, 2009 at 2:56pm
  • Plagiarism is the result of willfully copying another writer's work without citation. Everyone will, eventually, use the individual words of other writers (except for, perhaps, Umberto Eco, he uses words that no one else knows of.) So, to rephrase, the use of the almost verbatum written concepts of an author without citation is plagiarism. It is the cowardice of editors and publishers that allow this practice to continue. The money that is now involved with the high profile sciences makes the rewards for plagiarism very high. There should be an active 'plagiarism police force' at work in every science journal office, especially in the medical and pharma arenas, because these cheats will injure people in their efforts to get ahead by not being able to excel on their own merits.
    Leonardo
    Leonard Ochs Leonard Ochs
    Mar. 8, 2009 at 3:01pm
  • The main point is that the value of a scientific paper lies in the data, not the prose.

    I looked at the database, and they categorized things as follows:

    same author, same journal
    same author, diff journal
    diff author, same journal
    diff author, diff journal

    The determining factors to me are whether the paper was a review or original research report, and whether or not actual data or images were repeated.

    Reviews sent by the same author to a different journal aren't plagiarism. I don't know why the second journal would want to publish it, but that's their call. I think most have policies against doing that.
    Incorporating large chunks of your research report in a review you're writing, or even someone else incorporating it in a review they're writing is probably OK. It's lazy, because it could probably be improved by incorporation of developments since the original was written, but it's not dishonest or bad practice.
    Likewise, a introduction section from a closely related paper is OK to use, because the introductory material actually is the same, though again recent developments need to be incorporated.
    Copying of methods sections is another situation where copying should almost be encouraged. We had a case where someone accidentally wrote "1500 rpm" when it should have been "1500 x g". The extent to which that skewed things never was sorted out, but if they had just copied the methods section from the paper they got it from, that wouldn't have happened. Of course, no one ever includes enough detail to reproduce something exactly, so you should add material where you filled in the blanks or made modifications.
    Copying of published images or data doesn't exactly fit the definition of plagiarism, but it's still a career-ending move, nonetheless.
    The only case where it would actually fit my understanding of the term is if a different author took a review you wrote(with no original data) and sent it to a different journal.

    I think efforts are probably better spent detecting duplicated/fabricated data, not text, since that's where the value lies in scientific publications.
    William Gunn William Gunn
    Mar. 8, 2009 at 11:18pm
  • Genetic disorders are often caused by sperm DNA that has double strand breaks, copy number variations, point mutations and imprinting mutations that have to do with advancing paternal age. Men need to know about their biological clock and father babies in their 20s and very early



    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    iSo AsTaLaViSTa iSo AsTaLaViSTa
    Dec. 26, 2009 at 9:18pm

  • [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    m9bnat m9bnat m9bnat m9bnat
    Jan. 5, 2010 at 8:14pm
  • Was very useful article. Thank you.. [Link was removed]
    asda asdasd asda asdasd
    Jan. 10, 2010 at 7:30pm
  • GooGle
    nikol kolo nikol kolo
    Jan. 14, 2010 at 7:17am
  • [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    [Link was removed]
    Science News Science News
    Jan. 14, 2010 at 5:31pm
Registered readers are invited to post a comment. To encourage fruitful discussion, please keep your comments relevant, brief and courteous. Offensive, irrelevant, nonsensical and commercial posts will not be published. (All links will be removed from comments.)

You must register with Science News to add a comment. To log-in click here. To register as a new user, follow this link.

Advertisement
Suggested Reading :
seperator
Citations & References :
seperator
  • Long, T.C., . . . and H.R. Garner. 2009. Responding to Possible Plagiarism. Science 323(March 6):1293.
  • Deja Vu: A Database of Highly Similar and Duplicate Citations. This is being compiled using data mining software that has been developed by the Harold Garner lab at the University of Texas, Southwestern Medical Center in Dallas. [Go to]
Reader Favorites:
seperator
SN on the Web:
seperator