Context | Science News



Support credible science journalism.

Subscribe to Science News today.


Science past and present

Tom Siegfried



Debates on whether science is broken don’t fit in tweets

Multiple proposals about P values suggest science needs repairs


Scientists have some new ideas for reducing irreproducibility and the problem of P values.

Sponsor Message

In the Twitterverse, science can stir up some vigorous debates. And they’re not all about the standard issues of climate change, vaccines and evolution. Some dueling tweets involve the scientific enterprise itself.

For instance, one recent tweet proclaimed “Science isn’t ‘self-correcting.’ Science is broken,” linking to a commentary about the well-documented problem that many scientific study results cannot be reproduced by follow-up experiments. To which an angry biologist/blogger replied: “No it’s not. Journalism is broken, as your clickbait-y title shows.”

Without taking sides (yet), it’s safe to say that part of the problem is that tweets don’t allow room for nuance. Whether science is broken or not depends on what you mean by “broken.” Maybe saying science is broken is not a fair assessment out of context. But nobody who has been paying attention could intelligently disagree that some aspects of scientific procedure are in need of repair. Otherwise it’s hard to explain why so many scientists are proposing so many major fixes.

Most such proposals have to do with one of the most notorious among science’s maladies: the improper use of statistical inference. One new paper, for instance, examines the use of statistics in medical clinical trials. Because of the flaws in standard statistical methods, even a properly conducted trial may reach erroneous conclusions, writes mathematician-biostatistician Leonid Hanin of Idaho State University. “Our main conclusion is that even a totally unbiased, perfectly randomized, reliably blinded, and faithfully executed clinical trial may still generate false and irreproducible results,” he writes in a recent issue of BMC Medical Research Methodology.

Clinical trials are not a special case. Many other realms of science, from psychology to ecology, are as messy as medicine. From the ill effects of pollutants to the curative power of medical drugs, deciding what causes harm or what cures ills requires data. Analyzing such data typically involves formulating a hypothesis, collecting data and using statistical methods to calculate whether the data support the hypothesis.

Such calculations generally produce a P value — the probability of obtaining the observed data (or results even more extreme) if there is no real effect (the null hypothesis). If that probability is low (by the usual convention, less than 5 percent, or P less than .05), most scientists conclude that they have found evidence for a real effect and send a paper off for publication in a journal. Astute critics of this method have long observed, though, that a low P value is not really evidence of an effect — it just tells you that you should be surprised to see such data if there is no effect. In other words, the P value is a statement about the data, not the hypothesis.

Scientists therefore often conclude they have found an effect when none actually exists. Such “false positive” results plague many fields, particularly psychology. Studies have shown that many if not most reported psychology findings are not reproduced when the experiment is repeated. But no scientific discipline is immune from this “irreproducibility” problem. Many scientists think it’s time to do something about it.

One recent paper, with 72 authors, proposes attacking the problem by changing the convention for a “statistically significant” P value. Instead of .05, the current convention, these authors suggest .005, so you could claim statistically significant evidence for an effect only if the chances of getting your result (with no true effect) was half a percent. “This simple step would immediately improve the reproducibility of scientific research in many fields,” the authors write. A P value of less than .05 should be labeled as merely “suggestive,” they say, not significant.

Such a tougher threshold no doubt would reduce the number of false positives. But this approach does not address the underlying problems that P values pose to begin with. They are still evidence about the data, not the hypothesis. And while a tougher standard would reduce false positives, it would surely also increase the number of false negatives — that is, finding no effect when there really was one. In any case, changing one arbitrary standard to another would do nothing about the widespread misinterpretation and misuse of P values, or change the fact that a statistically significant P value can be calculated for an effect that is insignificant in practical terms. 

A second fix suggests not changing the P value significance threshold, but better explaining what a given P value means. One common misinterpretation is that a P value of .05 implies a 95 percent probability that the effect is real (or, in other words, that the chance of a false positive is only 5 percent). That’s baloney (and a logical fallacy as well). Gauging the likelihood of a real effect requires some knowledge of how likely such an effect was before conducting the experiment.

David Colquhoun, a retired pharmacologist and feisty tweeter on these issues, proposes that researchers should report not only the P value, but also how likely the hypothesis needed to be to assure only a 5 percent false positive risk. In a recent paper available online, Colquhoun argues that the terms “significant” or “nonsignificant” should never be used. Instead, he advises, “P values should be supplemented by specifying the prior probability” corresponding to a specific false positive risk.

For instance, suppose you’re testing a drug with a 50-50 chance of being effective — in other words, the prior probability of an effect is 0.5. If the data yield a P value of .05, the risk of a false positive is 26 percent, Colquhoun calculates. If you’re testing a long shot, say with a 10 percent chance of being effective, the false positive risk for a P value of .05 is 76 percent.

Of course, as Colquhoun acknowledges, you never really know what the prior probability is. But you can calculate what the prior probability needs to be to give you confidence in your result. If your goal is a 5 percent risk of a false positive, you need a prior probability of 87 percent when the P value is .05. So you’d already have to be pretty sure that the effect was real, rendering the evidence provided by the actual experiment superfluous. So Colquhoun’s idea is more like a truth in labeling effort than a resolution of the problem. Its main advantage is making the problem more obvious, helping to avoid the common misinterpretations of P values.

Many other solutions to the problem of P values have been proposed, including banning them. Recognition of the problem is so widespread that headlines proclaiming science to be broken should not come as a surprise. Nor should such headlines be condemned because they may give aid and comfort to science’s enemies. Science is about seeking the truth, and if the truth is that science’s methods aren’t succeeding at that, it’s every scientist’s duty to say so and take steps to do something about it.

Yet there is another side to the “science is broken” question. The answer does not depend only on what you mean by “broken,” but also on what you mean by “science.” It’s beyond doubt that individual scientific studies, taken in isolation, are not a reliable way of drawing conclusions about reality. But somehow in the long run, science as a whole provides a dependable guide to the natural world — by far a better guide than any alternative way of knowing about nature. Sound science is the science established over decades of investigation by well-informed experts who take much more into account than just the evidence provided by statistical inference. Wisdom and judgment, not rote calculation, produce the depth of insight into reality that makes science the valuable and reliable enterprise it is. It’s the existence of thinkers with wisdom and judgment that makes science, in the biggest truest sense, not really broken. We just need to get more of those thinkers to tweet.

Follow me on Twitter: @tom_siegfried

Astronomy,, History of Science

Eclipses show wrong physics can give right results

By Tom Siegfried 3:30pm, August 17, 2017
Math for making astronomical predictions doesn’t necessarily reflect physical reality.
Quantum Physics

Modern-day Alice trades looking glass for wormhole to explore quantum wonderland

By Tom Siegfried 7:00am, August 2, 2017
A new paper shows how the possibility of wormholes linking quantum-entangled black holes could be tested in the laboratory.

There’s a long way to go in understanding the brain

By Tom Siegfried 7:00am, July 25, 2017
Neuroscientists offer multiple “perspectives” on how to plug gaps in current knowledge of the brain’s inner workings.
History of Science

Top 10 discoveries about waves

By Tom Siegfried 9:00am, June 14, 2017
Another gravitational wave detection reaffirms the importance of waves for a vast spectrum of physical processes and technologies.
History of Science,, Astronomy

The first Cassini to explore Saturn was a person

By Tom Siegfried 7:00am, May 15, 2017
Cassini, the spacecraft about to dive into Saturn, was named for the astronomical pioneer who first perceived the gap between the planet’s famous rings.
History of Science

Top 10 science anniversaries of 2017

By Tom Siegfried 10:57am, April 21, 2017
2017 offers an abundance of scientific anniversaries to celebrate, from pulsars and pulsar planets to Einstein’s laser, Einstein’s cosmos and the laws of robotics.
History of Science,, Cosmology

Einstein’s latest anniversary marks the birth of modern cosmology

By Tom Siegfried 11:45am, April 11, 2017
A century ago, Einstein gave birth to modern cosmology by using his general theory of relativity to describe the universe.
Astronomy,, History of Science

In 20th century, astronomers opened their minds to gazillions of galaxies

By Tom Siegfried 7:00am, February 2, 2017
Telescopes in the U.S. West opened astronomers’ eyes to a vast, expanding universe containing countless galaxies.

Health official calls on neuroscience to fight mental illness

By Tom Siegfried 1:00pm, December 8, 2016
When it comes to mental health, all countries are developing countries, WHO official says, appealing to neuroscience for help.
Quantum Physics

Why quantum mechanics might need an overhaul

By Tom Siegfried 3:37pm, November 4, 2016
Nobel laureate Steven Weinberg says current debates in quantum mechanics need a new approach to comprehend reality.
Subscribe to RSS - Context