Without scientific competition and open debate, much psychology research goes nowhere
In its idealized form, science resembles a championship boxing match. Theories square off, each vying for the gold belt engraved with “Truth.” Under the stern eyes of a host of referees, one theory triumphs by best explaining available evidence — at least until the next bout.
But in the real world, science sometimes works more like a fashion show. Researchers clothe plausible explanations of experimental findings in glittery statistical suits and gowns. These gussied-up hypotheses charm journal editors and attract media coverage with carefully orchestrated runway struts, never having to battle competitors.
Then there’s psychology. Even more than other social scientists — and certainly more than physical scientists — psychologists tend to overlook or dismiss hypotheses that might topple their own, says Klaus Fiedler of the University of Heidelberg in Germany. They explain experimental findings with ambiguous terms that make no testable predictions at all; they build careers on theories that have never bested a competitor in a fair scientific fight. In many cases, no one knows or bothers to check how much common ground one theory shares with others that address the same topic. Problems like these, Fiedler and his colleagues contended last November in Perspectives in Psychological Science, afflict sets of related theories about such psychological phenomena as memory and decision making. In the end, that affects how well these phenomena are understood.
Fiedler’s critique comes at a time when psychologists are making a well-publicized effort to clean up their research procedures, as described in several reports published alongside his paper. In fact, researchers generally concede that many published psychology studies have been conducted in ways that conceal their statistical frailty — and thus the validity of their conclusions. But Fiedler suspects the new push to sanitize psychology’s statistical house won’t make much difference in the long run. Findings published in big-time journals draw enough media coverage to bring the scrutiny of other researchers, who eventually expose bogus and overblown effects. “Advances in psychology will depend more on open-minded theoretical thinking than on better monitoring of statistical practices,” he says.
When Fiedler gives talks to groups of psychologists, he tries to identify open-minded theoretical thinkers by posing a couple of questions.
First, he asks audience members to name a published study in which investigators uncovered an interesting, statistically significant effect that vanished in later reports. In a seminar conducted last year by Fiedler at a major Dutch university, 38 research psychologists had no problem citing flash-in-the-pan findings. Many remembered a well-known but now contested report that college students react to subtle reminders of old age by walking more slowly, allegedly because healthy young people unconsciously act out prompted stereotypes of the elderly (SN: 5/19/12, p. 26).
In that experiment, student volunteers were timed walking down a corridor after unscrambling sentences that, for one group, contained senior citizen–related words such as wrinkle and Florida. Researchers who conducted the investigation concluded that students weren’t aware of having registered the stereotypical words, but still acted out an elderly stereotype by slowing their pace shortly after the reading exercise.
But researchers did not consider the possibility that their facial expressions or body language might subtly have encouraged the student volunteers to walk more slowly. They didn’t ask themselves whether some students noticed elder-related words while unscrambling sentences and supposed that experimenters wanted them to mimic seniors. They did not explore whether some students quickly drew conclusions about what was expected of them and how to behave, regardless of any unintended signals from experimenters. Nor did they examine whether reading words related to any upsetting or thought-provoking topic would make people walk more slowly.
Fiedler’s point: Blindness to additional, possibly superior, explanations for experimental results plagues even prominent psychological theories. “Psychologists too often fail to consider that the truth may be broader than their hypotheses,” says psychologist Barbara Spellman of the University of Virginia in Charlottesville. Spellman edits the journal Perspectives in Psychological Science, in which Fiedler’s article appears.
And indeed, as in other seminars Fiedler has run, only a few of the psychologists at the Dutch seminar came up with anything when they were asked to name an experiment that included a competing account for any set of results.
Null and void
Geoffrey Loftus, a psychologist at the University of Washington in Seattle, is an ally in Fiedler’s battle to broaden psychology’s perspectives. As editor of Memory & Cognition from 1993 to 1997, Loftus implored researchers to avoid a standard statistical practice in psychology known as null hypothesis significance testing that, in his view, perpetuates theoretical chaos. He continued to attack the practice in a talk last November at the Psychonomic Society’s annual meeting in Minneapolis.
Null hypothesis refers to a default position: that there is no relationship except chance between two measured phenomena in an experiment (for example, it’s only by chance that college students walk at different speeds after they’ve read words that refer to old age). To conclude that there are grounds to say that a relationship exists between two phenomena, the null hypotheses must be rejected. This technique requires researchers to calculate whether an assumption that no experimental effect exists can be rejected as statistically unlikely based on measured differences between groups.
This is a statistical charade, Loftus contends, since measures taken before and after any test are virtually never the same. Rejecting a null hypothesis doesn’t tell a researcher anything new, even if the threat of finding an effect that doesn’t really exist has been eliminated. “Significance testing is all about how the world isn’t,” Loftus contends, “and says nothing about how the world is.”
The art of theory construction in psychology has withered during the field’s 50-year romance with null hypothesis significance testing, asserts psychologist Gerd Gigerenzer of the Max Planck Institute for Human Development in Berlin: “The problem is not that researchers think that theory is irrelevant, but that almost anything passes as a theory.”
Gigerenzer has identified three types of theory substitutes in psychology. Each surrogate for theory is so vague and prediction-free that it can’t be proven wrong.
First, Gigerenzer says, investigators sometimes explain their findings by using a term for a theory that can be construed to explain not only an observed effect but also its opposite. Consider “representativeness,” which many decision researchers use to explain gamblers’ frequent intuition that, after landing on a series of red spaces on a roulette table, they’re going to land on a black space. In this case, psychologists interpret representativeness to mean that people assume that random sequences of two outcomes are best represented by a short sequence containing both: reds and blacks when playing roulette, or heads and tails when flipping a coin.
Yet investigators have also used representativeness to explain the opposite intuition, in which people assume that a streak of outcomes is likely to continue. Sports fans demonstrate this kind of intuition when they attribute “hot hands” to basketball players who make several shots in a row (SN: 2/12/11, p. 26). The fans expect the players to sink their next try. In this case, representativeness is interpreted to mean that people regard a run of scores as characteristic of a larger random sequence containing streaks of scores and misses.
Another theory-avoiding tactic consists of describing a finding without trying to explain it, Gigerenzer says. The phrase “inequality aversion” has been applied in some studies to describe the willingness of subjects to divide a pot of money equally rather than to find some other way to divide it. Inequality aversion addresses how participants behaved, but it makes no prediction about why they behaved that way.
Perhaps the most popular theory surrogates are two-system theories. Many psychologists now assume that we make decisions using two mental systems: System 1, in which we make quick, intuitive decisions based on fallible rules of thumb, and System 2, in which we make logical, deliberate choices that require more time and brain power. Psychologist Daniel Kahneman of Princeton University, a Nobel laureate in economics, has done the most to popularize the System 1/System 2 distinction.
Gigerenzer contends that almost any behavior in a decision-making study can be attributed to either System 1 or System 2. In the January 2011 Psychological Review, he and psychologist Arie Kruglanski of the University of Maryland in College Park argued that intuitive and deliberate judgments alike are based on shared rules of thumb, or heuristics. Many parents intuitively allocate attention and love equally to all of their children, for instance, and many investors deliberately follow the same simple rule by allocating money equally to all of their chosen stocks to reduce risk (SN: 6/4/11, p. 26).
Dividing the mind into a nebulous split between intuitive heuristics and logical rule-following distracts scientists from exploring how heuristics operate in both intuitive and deliberative ways and in what situations heuristics work best, Gigerenzer argues.
None of this is to say that psychology has no genuine theories, but many of them exist in splendid isolation. Most psychologists work in narrow communities, such as developmental psychology and social psychology, where established theories are rarely challenged. As a quotation cited in 2008 by psychologist Walter Mischel of Columbia University in New York City puts it, “Psychologists treat other people’s theories like toothbrushes — no self-respecting person wants to use anyone else’s.” That kind of professional isolationism leads to “theoretical disorganization,” write Eli Finkel of Northwestern University in Evanston, Ill., and Paul Eastwick of the University of Texas at Austin.
In a chapter in an upcoming book, Finkel and Eastwick discuss theories about how men and women are attracted to each other. One popular theory holds that people are attracted to others who satisfy general needs for pleasure, belonging and a few other social prizes. A second approach posits that people have evolved certain types of mating strategies over the past few million years. A third perspective assumes that individuals form relationship styles early in life with parents and others that orchestrate choices of romantic partners decades later.
Finkel and Eastwick propose that all three approaches can be organized around a principle, developed in related research, that attraction depends on how well one person enables another to achieve urgent goals for pleasure, reproduction, a good relationship fit — or anything else. Research grounded in that principle has the potential to produce a unified theory of attraction.
Opportunities to unify related theories often arise when scientists from different disciplines collaborate on studies of broad topics such as decision making or moral behavior, Gigerenzer says. He heads a team of scientists with backgrounds ranging from ecology to economics that studies heuristic reasoning. Members of this group have found commonalities between a complex model of thinking and decision making developed by psychologist John Anderson of Carnegie Mellon University in Pittsburgh and a simple decision-making rule that is surprisingly effective in certain situations.
The rule goes like this: If an experimental subject is asked to make a choice where one of two options is recognized, the subject will pick the familiar item. In studies of German and U.S. students, each group did better at identifying the larger city from pairs of choices in foreign countries than from pairs in their homelands. Partial ignorance about foreign cities led the students to choose the most familiar city. Since better-known cities tend to be especially large ones, the students’ simple tactic worked surprisingly well. Recognition-guided choices weren’t an option for pairs of familiar cities in students’ native lands.
For decades, popular research tools, from statistical methods to computers, have been proposed as models of how people think. Once a research tool gains traction as a theory of the mind — say, the notion of the mind as an information-processing computer — creative thinking about alternative theories becomes increasingly difficult, Gigerenzer says.
That may be so, but psychologist Uri Simonsohn of the University of Pennsylvania in Philadelphia believes that the researchers’ efforts to upgrade statistical practices can coexist with hypothesis competition and theory integration.
In a 2011 paper in Psychological Science that has become a manifesto for those aiming to minimize published results that vanish on closer inspection, Simonsohn and his colleagues recommended ways to discourage researchers from cherry-picking data to include in final reports, altering experimental conditions that don’t work as planned and using other tactics that disguise statistical weakness.
Some researchers propose using a statistical technique known as Bayesian analysis that estimates which of several hypotheses best explains a set of results. But despite the strengths of Bayesian statistics, investigators can still exclude inconvenient data or hypotheses from this approach, Simonsohn holds.
In the end, no statistical procedure can thrust psychological research into the championship ring, where losses sting but unexpected wins reap big rewards, Fiedler says. In scientific cultures that encourage clear predictions and open debate, even vanquished predictions get respect for having helped to advance knowledge.
“It is a good morning exercise for a research scientist to discard a pet hypothesis every day before breakfast,” the late ethologist Konrad Lorenz wrote. “It keeps him young.”
The lesson of Clever Hans
Any scientist will admit that unconscious cuing by an experimenter can introduce bias into testing. A German named William von Osten and his horse Hans unwittingly demonstrated that — and inspired the term Clever Hans effect. Von Osten became famous in 1891 for public displays of Hans’ ability to perform mathematical calculations and other feats by tapping his hoof. No cheating was apparent, but in 1907 psychologist Oskar Pfungst investigated claims about Hans’ intelligence. Pfungst had different experimenters ask questions standing at varying distances from Hans. Sometimes Hans wore blinders; sometimes the experimenters knew the answers to their own questions and sometimes they didn’t. Pfungst discovered not only that Hans needed visual contact with the questioner but also that Hans couldn’t answer a question when the experimenter didn’t know the answer. Conclusion: Although questioners were not consciously cuing Hans to start or stop tapping, their facial expressions or involuntary movements were enough for Clever Hans to catch on. — Bruce Bower
K. Fiedler et al. The long way from ∂-error control to validity proper: Problems with a short-sighted false-positive debate. Perspectives on Psychological Science. Vol. 7, Nov. 2012, p. 661. doi:10.1177/1745691612462587. Abstract available: [Go to]
E. Finkel and P. Eastwick. Interpersonal attraction: In search of a theoretical Rosetta Stone. In press, in J. Simpson and J. Dovidio, editors, Handbook of personality and social psychology: Interpersonal relations and group processes. American Psychological Association Press, 2013.
G. Gigerenzer. Personal reflections on theory and psychology. Theory & Psychology. Vol. 20, Dec. 2010, p. 733. doi:10.1177/0959354310378184. Abstract available: [Go to]
L. John et al. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science. Vol. 23, May 2012, p. 524. doi:10.1177/0956797611430953. Abstract available: [Go to]
A. Kruglanski and G. Gigerenzer. Intuitive and deliberative judgments are based on common principles. Psychological Review. Vol. 118, Jan. 2011, p. 97. doi:10.1037/a0020762. Abstract available: [Go to]
W. Mischel. The toothbrush problem. APS Observer. Vol. 21, Dec. 2008. Available online: [Go to]
J. Simmons et al. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science. Vol. 22, Nov. 2011, p. 1359. doi:10.1177/0956797611417632. Abstract available: [Go to]
B. Bower. The hot and cold of priming. Science News. Vol. 181, May 19, 2012, p. 26. Available online: [Go to]
B. Bower. Simple Heresy. Science News. Vol. 179, June 4, 2011, p. 26. Available online: [Go to]
T. Siegfried. Odds are, it’s wrong. Science News. Vol. 177, March 27, 2010, p. 26. Available online: [Go to]
Note: To comment, Science News subscribing members must now establish a separate login relationship with Disqus. Click the Disqus icon below, enter your e-mail and click “forgot password” to reset your password. You may also log into Disqus using Facebook, Twitter or Google.