Está en la página 1de 3

Scientists Perturbed by Loss of Stat Tools to Sift Research Fudge from Fact - Scientific American

4/22/15, 1:45 PM

ADVERTISEMENT

Permanent Address: http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tools-to-sift-research-fudge-from-fact/


More Science News

Scientists Perturbed by Loss of Stat Tools to Si5 Research


Fudge from Fact
The journal Basic and Applied Social Psychology recently banned the use of p-values and other statistical methods to quantify uncertainty from
significance in research results
April 16, 2015 | By Regina Nuzzo |

Psychology researchers have recently found themselves engaged in a bout of


statistical soul-searching. In apparently the first such move ever for a scientific
journal the editors of Basic and Applied Social Psychology announced in a February
editorial that researchers who submit studies for publication would not be allowed to
use a common suite of statistical methods, including a controversial measure called
the p-value.
These methods, referred to as null hypothesis significance testing, or NHST, are
deeply embedded into the modern scientific research process, and some researchers
have been left wondering where to turn. The p-value is the most widely known
statistic, says biostatistician Jeff Leek of Johns Hopkins University. Leek has
estimated that the p-value has been used at least three million scientific papers.
Significance testing is so popular that, as the journal editorial itself acknowledges,
there are no widely accepted alternative ways to quantify the uncertainty in research
resultsand uncertainty is crucial for estimating how well a studys results
generalize to the broader population.

Many researchers have labored under the misbelief


that the p-value gives the probability that their
studys results are just pure random chance.
Credit: Lenilucho/Wikipedia

ADVERTISEMENT

Unfortunately, p-values are also widely misunderstood, often believed to furnish more information than they do. Many researchers have
labored under the misbelief that the p-value gives the probability that their studys results are just pure random chance. But statisticians
say the p-values information is much more non-specific, and can interpreted only in the context of hypothetical alternative scenarios:
The p-value summarizes how often results at least as extreme as those observed would show up if the study were repeated an infinite
number of times when in fact only pure random chance were at work.
This means that the p-value is a statement about imaginary data in hypothetical study replications, not a statement about actual
conclusions in any given study. Instead of being a scientific lie detector that can get at the truth of a particular scientific finding, the p-

http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tools-to-sift-research-fudge-from-fact/?print=true

Page 1 of 3

Scientists Perturbed by Loss of Stat Tools to Sift Research Fudge from Fact - Scientific American

4/22/15, 1:45 PM

value is more of an alternative reality machine that lets researchers compare their results with what random chance would
hypothetically produce. What p-values do is address the wrong questions, and this has caused widespread confusion, says
psychologist Eric-Jan Wagenmakers at the University of Amsterdam.
Ostensibly, p-values allow researchers to draw nuanced, objective scientific conclusions as long as it is part of a careful process of
experimental design and analysis. But critics have complained that in practice the p-value in the context of significance testing has been
bastardized into a sort of crude spam filter for scientific findings: If the p-value on a potentially interesting result is smaller than 0.05,
the result is deemed statistically significant and passed on for publication, according to the recipe; anything with larger p-values is
destined for the trash bin.
Quitting p-values cold turkey was a drastic step. The null hypothesis significance testing procedure is logically invalid, and so it seems
sensible to eliminate it from science, says psychologist David Trafimow of New Mexico State University in Las Cruces, editor of the
journal. A strongly worded editorial discouraged significance testing in the journal last year. But after researchers failed to heed the
warning, Trafimow says, he and associate editor Michael Marks decided this year to go ahead with the new diktat. Statisticians have
critiqued these concepts for many decades but no journal has had the guts to ban them outright, Wagenmakers says.
Significance testing became enshrined in textbooks in the 1940s when scientists, in desperate search of data-analysis recipes that were
easy for nonspecialists to follow, ended up mashing together two incompatible statistical systemsp-values and hypothesis testing
into one rote procedure. P-values were never meant to be used the way were using them today, says biostatistician Steven Goodman
of Stanford University.
Although the laundry list of gripes against significance testing is long and rather technical, the complaints center around a common
theme: Significance testings scientific spam filter does a poor job of helping researchers separate the true and important effects from
the lookalike ones. The implication is that scientific journals might be littered with claims and conclusions that are not likely to be true.
I believe that psychologists have woken up and come to the realization that some work published in high-impact journals is plain
nonsense, Wagenmakers says.
Not that psychology has a monopoly on publishing results that collapse on closer inspection. For example, gene-hunting researchers in
large-scale genomic studies used to be plagued by too many false-alarm results that flagged unimportant genes. But since the field
developed new statistical techniques and moved away from the automatic use of p-values, the reliability of results has improved, Leek
says.
Confusing as p-values are, however, not everyone is a fan of taking them from researchers statistical took kits. This might be a case in
which the cure is worse than the disease, Goodman says. The goal should be the intelligent use of statistics. If the journal is going to
take away a tool, however misused, they need to substitute it with something more meaningful.
One possible replacement that might fit the bill is a rival approach of data analysis called Bayesianism. (The journal said it will consider
its use in submitted papers on a case-by-case basis.) Bayesianism starts from different principles altogether: Rather than striving for
scientifically objective conclusions, this statistical system embraces the subjective, allowing researchers to incorporate their own prior
knowledge and beliefs. One obstacle to the widespread use of Bayesianism has been the lack of user-friendly statistical software. To this
end Wagenmakers team is working to develop a free, open-source statistical software package called JASP. It boasts the tagline:
Bayesian statistics made accessible.
Other solutions attack the problem from a different angle: human nature. Because researchers in modern science face stiff competition
and need to churn out enough statistically significant results for publication and therefore promotion it is no surprise that research
groups somehow manage to find significant p-values more often than would be expected, a phenomenon dubbed p-hacking in 2011 by
psychologist Uri Simonsohn at the University of Pennsylvania.
http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tools-to-sift-research-fudge-from-fact/?print=true

Page 2 of 3

Scientists Perturbed by Loss of Stat Tools to Sift Research Fudge from Fact - Scientific American

4/22/15, 1:45 PM

Several journals are trying a new approach, spearheaded by psychologist Christopher Chambers of Cardiff University in Wales, in which
researchers publicly preregister all their study analysis plans in advance. This gives them less wiggle room to engage in the sort of
unconsciousor even deliberatep-hacking that happens when researchers change their analyses in midstream to yield results that are
more statistically significant than they would be otherwise. In exchange, researchers get priority for publishing the results of these
preregistered studieseven if they end up with a p-value that falls short of the normal publishable standard.
Finally, some statisticians are banking on education being the answer. P-values are complicated and require training to understand,
Leek says. Science education has yet to fully adapt to a world in which data are both plentiful and unavoidable, without enough
statistical consultants to go around, he says, so most researchers are stuck analyzing their own data with only a couple of stats courses
under their belts. Most researchers do not care about the details of statistical methods, Wagenmakers says. They use them only to
support their claims in a general sense, to be able to tell their colleagues, see, I am allowed to make this claim, because p is less than
.05, now stop questioning my result.
A new, online nine-course data science specialization for professionals with very little background in statistics might change that. Leek
and his colleagues at Johns Hopkins rolled out the free courses last year, available via the popular Coursera online continuing education
platform, and already have two million students have registered. As part of the sequence, Leek says, a full monthlong course will be
devoted specifically to understanding methods that allow researchers to convey uncertainty and generalizability of study findings
including, yes, p-values.

Scientific American is a trademark of Scientific American, Inc., used with permission


2015 Scientific American, a Division of Nature America, Inc.
All Rights Reserved.

YES! Send me a free issue of Scientific

American with no obligation to continue


the subscription. If I like it, I will be billed
for the one-year subscription.

Subscribe Now

http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tools-to-sift-research-fudge-from-fact/?print=true

Page 3 of 3

También podría gustarte