Peer review: No improvement with practice

September 18, 2009 at 6:10 pm - More than 2 years ago

VANCOUVER, B.C. Research journals depend on the unpaid labor of experts in their fields to evaluate which of their peers’ manuscripts are good enough to publish. It’s a semi-thankless task and each paper that’s reviewed may require five or more hours of a scientist’s or physician’s time. Considering the pivotal role that these reviewers play, it’s important that they’re up to snuff.

So it’s rather disturbing to learn that a major new study concludes reviewers don’t improve with experience. Actually, they get demonstrably worse. What best distinguishes reviewers is merely how quickly their performance falls, according to Michael Callaham.

He’s an emergency medicine physician at the University of California, San Francisco — and editor-in-chief of Annals of Emergency Medicine, a peer-reviewed journal. The data he reported on Sept. 10, here, at the International Congress on Peer Review and Biomedical Publication, came from an analysis of reviews by anyone and everyone who had ever reviewed more than one paper for the journal between March 1994 and November 2008.

Although earlier studies had reported evidence that the quality of an individual’s manuscript reviews can fall over time, the new study differs by quantifying just how badly that performance deteriorated among roughly 1,000 experts.

Callaham noted that the editors at his journal have long rated every review that comes in on a scale of 1 to 5 — with 5 indicating the evaluation “is impossible to improve.” The average rating: 3.8.

With all of these assessments in hand — more than 14,000 in all — Callaham was able to sift through the stack handled by reviewers who had tackled at least two manuscripts. This group reviewed, on average, more than 13 manuscripts each over the 14-year span. And in general, manuscript analyses by these experts deteriorated about 0.04 point per year. The bottom three percent “got worse — deteriorated — 0.1 pt/yr,” Callaham reported.

Only about one percent of peer reviewers improved notably over time — by about 0.05 pt/yr. Another two percent improved, but less than that.

In looking for a positive way to “spin” his findings, Callaham calculated how long it would take the performance of reviewers to exhibit an “amount of deterioration or change that’s significant to an editor” — roughly a half-point drop on that 5-point scale. For those in the worst-performing group, it would take five years, he said. And the single worst reviewer: This expert’s decline, he said, should catch an editor’s attention within just three years.

What about those who improved over time? Combining them together, Callaham said, “It would take 25 years before you would notice their improvement.”

By the way, Annals editors experimented with providing written feedback to its reviewers. In a paper seven years ago, they noted that for most reviewers, this tactic “produced no improvement in reviewer performance.” At the Congress, here, Callaham reported that giving this feedback to poor reviewers “actually made their performance worse.”

Although his data were derived from experiences at a single, specialty journal, Callaham says “we believe the generalizability is pretty good” to other journals. To good journals, anyway. Experiences at his publication likely represent “a best-case scenario,” he maintains, “because we monitor our reviewer pool very closely, grading every single review.” Those who don’t perform well are gradually asked to do fewer reviews — and eventually are culled from the candidate pool.

Debra Houry of Emory University in Atlanta, another editor at Annals of Emergency Medicine, reported at the meeting on the journal’s experience with mentoring new manuscript evaluators. Reviewers rated as being in the annual top 50 at least twice in the past four years were asked to help out a newer candidate. Seventeen individuals were mentored over a two year period that involved evaluating at least three papers. Their scores were compared to those from another 15 individuals having similar experience who had not been mentored.

Although Annals editors scored the mentored reviewers about 1-point higher than the control group during the first year, by year two the two groups’ scores “converged to be similar,” Houry says.