P value ban: small step for a journal, giant leap for science

Editors reject flawed system of null hypothesis testing

March 17, 2015 at 3:18 pm

Imagine, if you dare, a world without P values.

Perhaps you’re already among the lucky participants in the human race who don’t know what a P value is. Trust me, you don’t want to. P stands for pernicious, and P values are at the root of all (well, most) scientific evil.

Of course, I don’t mean evil in the sense of James Bond’s villains. It’s an unintentional evil, but nevertheless a diabolical conspiracy of ignorance that litters the scientific literature with erroneous results. P values are supposed to help scientists decide whether an apparently meaningful experimental result is really just a fluke. But in fact, P values confuse more than they clarify. They are misused, misunderstood and misrepresented.

But now somebody is finally trying to do something about it.

Last month a scientific journal — Basic and Applied Social Psychology — announced that it won’t publish papers that mention the unmentionable P value. No longer will the journal permit published papers to report the P value’s use in the process of “null hypothesis testing,” which psychologists and scientists in many other fields routinely rely on. Anyone embarking on a research career soon gets infected with this method. When you want to test to see whether a food additive causes cancer, or a medicine cures a disease, you assume that it doesn’t — the null hypothesis — and then do an experiment comparing the drug or medicine with a placebo, or another drug, or whatever. If more people survive with the medicine than with the placebo, maybe the medicine works. Or maybe that result was a fluke — the luck of the draw. P values supposedly tell you whether the difference you saw was luck or reality.

Except that they don’t. P value calculations tell you only the probability of seeing a result at least as big as what you saw if there is no real effect. (In other words, the P value calculation assumes the null hypothesis is true.) A small P value — low probability of the data you measured — might mean the null hypothesis is wrong, or it might mean that you just saw some unusual data. You don’t know which. And if there is a real effect, your calculation of a P value is rendered meaningless, because that calculation assumed that there wasn’t a real effect.

Nevertheless, the scientific establishment — the peer-reviewed journals that supposedly police scientific standards and decide what research gets published — has largely insisted on P values as a measure of publication worthiness. But now the editors of Basic and Applied Social Psychology have gone rogue.

“The [P value] fails to provide the probability of the null hypothesis, which is needed to provide a strong case for rejecting it,” David Trafimow and Michael Marks of New Mexico State University write in the journal’s editorial announcing the P value ban.

It’s no great shock that some of the world’s statistical organizations have reacted a bit negatively. In a statement, the American Statistical Association expressed concern that the P value–ban “may have its own negative consequences.” More than two dozen “distinguished statistical professionals” are developing a statement for the association “to appear later this year” that will “highlight the issues and competing viewpoints.” Composing such a statement was a very good idea — 50 years ago.

And in fact, for decades, many distinguished statistical professionals and others have been harping on the intellectual bankruptcy of P values and null hypothesis testing. “Despite the awesome pre-eminence this method has attained … it is based upon a fundamental misunderstanding of the nature of rational inference, and is seldom if ever appropriate to the aims of scientific research,” the philosopher of science William Rozeboom wrote — in 1960. Later he called it “surely the most bone-headedly misguided procedure ever institutionalized in the rote training of science students.”

Many others since Rozeboom have argued just as forcefully that P values are pathological. Their widespread use in scientific research renders many if not most scientific papers guilty of reporting a finding that will later turn out to be wrong. P values pose a serious problem that has plagued the scientific process for nearly a century.

Yet they remain persistently misunderstood. In an account of the Basic and Applied Social Psychology ban, a prestigious international scientific journal stated that “the closer to zero the P value gets, the greater the chance that the null hypothesis is false.” That’s utterly wrong, but it is often how P values get explained and understood. And perhaps that’s the best reason to get rid of them.

Follow me on Twitter: @tom_siegfried