Journal bias: Novelty preferred (which can be bad)

September 11, 2009 at 11:56 pm - More than 2 years ago

VANCOUVER, B.C. We’re born to love novelty. It grabs our attention. Sometimes it captivates. But new ideas, devices and therapies are not supposed to have some intrinsic right to center stage when it comes to biomedical journals. Here, quality rather than novelty is supposed to win the day. Doctors and patients depend on journals to present strong data, top notch analyses and reproducible findings.

Yet results of a study presented this morning at the peer-review congress I’m attending showed rather unambiguously that individuals who perform peer review for journals exhibit a strong preference for studies with positive findings.

These results caused an unmistakable buzz throughout the meeting hall. And Seth Leopold, who presented the new data, understands why. He notes that the type of underlying bias that he turned up threatens to undermine the reliability of “evidence-based medicine.”

Strong words. But not, necessarily, an exaggeration.

The meeting I’m attending is doing its best to demystify what goes on in peer review – and the implications of any bias that might creep in.

It was against this backdrop that Leopold, who heads orthopedic surgery at the University of Washington Medical Center in Seattle, dropped his bias bombshell. His team showed that when more than 200 trained peer reviewers were asked to rate a manuscript as worth publishing or not, they strongly preferred a paper that reported positive findings.

They also rated its methods section as better, even though it was the same – verbatim – as the methods section in a paper reporting equivocal findings. Moreover, each paper had been deliberately seeded with five errors, subtle mistakes that it would take careful reading and a trained reviewer to find. The same mistakes appeared in both papers, in the same places. But they were more often overlooked in the positive paper than in the equivocal one, the new study showed.

The findings are troubling, Leopold says, because if positive trials are preferentially chosen for publication, doctors will get a very skewed impression of the value of a treatment: “Novel treatments will appear more effective than they actually are.”

Journals and their reviewers know this. And that’s why quality journals have a commitment to publishing negative and/or equivocal studies.

That said, many doctors and biomedical-research analysts have long suspected that journals are preferentially publishing positive trials. The question was how to test that.

Even if tallies of published trials show that a preponderance are positive – reporting encouraging or beneficial outcomes – this might be a true reflection of research, which had cleverly chosen to pursue the most promising leads. Or, it might be that other factors had been torpedoing negative and equivocal trials, such as small sample sizes, weak execution of a study’s design or a focus on issues that were no longer of clinical interest.

So to probe whether non-positive trials were actually getting short shrift, Leopold’s team decided to compare reviewers’ evaluations of manuscripts that were identical in virtually every detail. To further level the playing field, he chose a test where either outcome – a positive or equivocal result – would be equally likely to affect patient care.

The fictitious trial investigated the role of different antibiotic regimens for preventing infections after elective joint replacements or spine surgery. A pre-surgical dose of antibiotics was supposedly administered to all patients. Half were also to have been randomly assigned post-surgical treatment with antibiotics for another 24 hours. The outcomes of this would-be trial should have been of interest, Leopold notes, because “right now there’s an ongoing debate about which [antibiotic] approach to use.”

Two versions of a manuscript were written to describe the trial. In one, the manufactured data showed an advantage for one of the treatments. In the second paper, neither treatment clearly outperformed the other. Except for these results, the papers were identical.

Leopold’s team then sent these manuscripts to physicians on a roster of volunteer reviewers at one or the other of two major specialty publications: The Journal of Bone and Joint Surgery (U.S.) and Clinical Orthopaedics and Related Research. (Among the many hats that Leopold wears, he’s associate editor of JBJS.)

The reviewers had all been forewarned that a test manuscript might come their way at some time over the course of a year. And they were offered the right to opt out of participating (because a typical review can steal many hours from a clinician’s busy schedule). When the test manuscripts did go out, the reviewers got no notice it was bogus.

Among the 55 reviewers for JBJS that read the positive-outcome manuscript, 98 percent recommended the journal publish it. Only 71 percent of reviewers getting the no-difference paper rated it ready for prime time. A similar, though not statistically significant, trend emerged among reviewers at the other journal: 97 percent recommended the positive version be published, whereas only 90 percent recommended that after reading the no-difference paper .

What’s particularly troubling, Leopold maintains, is that even if reviewers didn’t think the no-difference paper was interesting enough to publish, “This still shouldn’t change how they graded the methods portion of the paper.” Or why reviewers of this version of the manuscript were at least twice as likely to catch at least one of its embedded errors.

At the meeting, Povl Riis, the former editor in chief of the Danish Medical Journal, reported finding an innate bias of another sort in a study he helped conduct nearly 20 years ago. In that instance, his team prepared two versions of a paper — one in English, another in the native language of the reviewer (Danish, Swedish or Norwegian).

Reviewers got one or the other, which were identical in all respects other than language. And overwhelmingly, Riis told me, reviewers asked to evaluate the English paper rated it as stronger than did reviewers sent the manuscript written in their native language. Statistical and other minor errors that Riis had seeded in his papers were also more often missed by reviewers reading the English version.