Documentos de Académico
Documentos de Profesional
Documentos de Cultura
4/22/15, 1:45 PM
ADVERTISEMENT
ADVERTISEMENT
Unfortunately, p-values are also widely misunderstood, often believed to furnish more information than they do. Many researchers have
labored under the misbelief that the p-value gives the probability that their studys results are just pure random chance. But statisticians
say the p-values information is much more non-specific, and can interpreted only in the context of hypothetical alternative scenarios:
The p-value summarizes how often results at least as extreme as those observed would show up if the study were repeated an infinite
number of times when in fact only pure random chance were at work.
This means that the p-value is a statement about imaginary data in hypothetical study replications, not a statement about actual
conclusions in any given study. Instead of being a scientific lie detector that can get at the truth of a particular scientific finding, the p-
http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tools-to-sift-research-fudge-from-fact/?print=true
Page 1 of 3
Scientists Perturbed by Loss of Stat Tools to Sift Research Fudge from Fact - Scientific American
4/22/15, 1:45 PM
value is more of an alternative reality machine that lets researchers compare their results with what random chance would
hypothetically produce. What p-values do is address the wrong questions, and this has caused widespread confusion, says
psychologist Eric-Jan Wagenmakers at the University of Amsterdam.
Ostensibly, p-values allow researchers to draw nuanced, objective scientific conclusions as long as it is part of a careful process of
experimental design and analysis. But critics have complained that in practice the p-value in the context of significance testing has been
bastardized into a sort of crude spam filter for scientific findings: If the p-value on a potentially interesting result is smaller than 0.05,
the result is deemed statistically significant and passed on for publication, according to the recipe; anything with larger p-values is
destined for the trash bin.
Quitting p-values cold turkey was a drastic step. The null hypothesis significance testing procedure is logically invalid, and so it seems
sensible to eliminate it from science, says psychologist David Trafimow of New Mexico State University in Las Cruces, editor of the
journal. A strongly worded editorial discouraged significance testing in the journal last year. But after researchers failed to heed the
warning, Trafimow says, he and associate editor Michael Marks decided this year to go ahead with the new diktat. Statisticians have
critiqued these concepts for many decades but no journal has had the guts to ban them outright, Wagenmakers says.
Significance testing became enshrined in textbooks in the 1940s when scientists, in desperate search of data-analysis recipes that were
easy for nonspecialists to follow, ended up mashing together two incompatible statistical systemsp-values and hypothesis testing
into one rote procedure. P-values were never meant to be used the way were using them today, says biostatistician Steven Goodman
of Stanford University.
Although the laundry list of gripes against significance testing is long and rather technical, the complaints center around a common
theme: Significance testings scientific spam filter does a poor job of helping researchers separate the true and important effects from
the lookalike ones. The implication is that scientific journals might be littered with claims and conclusions that are not likely to be true.
I believe that psychologists have woken up and come to the realization that some work published in high-impact journals is plain
nonsense, Wagenmakers says.
Not that psychology has a monopoly on publishing results that collapse on closer inspection. For example, gene-hunting researchers in
large-scale genomic studies used to be plagued by too many false-alarm results that flagged unimportant genes. But since the field
developed new statistical techniques and moved away from the automatic use of p-values, the reliability of results has improved, Leek
says.
Confusing as p-values are, however, not everyone is a fan of taking them from researchers statistical took kits. This might be a case in
which the cure is worse than the disease, Goodman says. The goal should be the intelligent use of statistics. If the journal is going to
take away a tool, however misused, they need to substitute it with something more meaningful.
One possible replacement that might fit the bill is a rival approach of data analysis called Bayesianism. (The journal said it will consider
its use in submitted papers on a case-by-case basis.) Bayesianism starts from different principles altogether: Rather than striving for
scientifically objective conclusions, this statistical system embraces the subjective, allowing researchers to incorporate their own prior
knowledge and beliefs. One obstacle to the widespread use of Bayesianism has been the lack of user-friendly statistical software. To this
end Wagenmakers team is working to develop a free, open-source statistical software package called JASP. It boasts the tagline:
Bayesian statistics made accessible.
Other solutions attack the problem from a different angle: human nature. Because researchers in modern science face stiff competition
and need to churn out enough statistically significant results for publication and therefore promotion it is no surprise that research
groups somehow manage to find significant p-values more often than would be expected, a phenomenon dubbed p-hacking in 2011 by
psychologist Uri Simonsohn at the University of Pennsylvania.
http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tools-to-sift-research-fudge-from-fact/?print=true
Page 2 of 3
Scientists Perturbed by Loss of Stat Tools to Sift Research Fudge from Fact - Scientific American
4/22/15, 1:45 PM
Several journals are trying a new approach, spearheaded by psychologist Christopher Chambers of Cardiff University in Wales, in which
researchers publicly preregister all their study analysis plans in advance. This gives them less wiggle room to engage in the sort of
unconsciousor even deliberatep-hacking that happens when researchers change their analyses in midstream to yield results that are
more statistically significant than they would be otherwise. In exchange, researchers get priority for publishing the results of these
preregistered studieseven if they end up with a p-value that falls short of the normal publishable standard.
Finally, some statisticians are banking on education being the answer. P-values are complicated and require training to understand,
Leek says. Science education has yet to fully adapt to a world in which data are both plentiful and unavoidable, without enough
statistical consultants to go around, he says, so most researchers are stuck analyzing their own data with only a couple of stats courses
under their belts. Most researchers do not care about the details of statistical methods, Wagenmakers says. They use them only to
support their claims in a general sense, to be able to tell their colleagues, see, I am allowed to make this claim, because p is less than
.05, now stop questioning my result.
A new, online nine-course data science specialization for professionals with very little background in statistics might change that. Leek
and his colleagues at Johns Hopkins rolled out the free courses last year, available via the popular Coursera online continuing education
platform, and already have two million students have registered. As part of the sequence, Leek says, a full monthlong course will be
devoted specifically to understanding methods that allow researchers to convey uncertainty and generalizability of study findings
including, yes, p-values.
Subscribe Now
http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tools-to-sift-research-fudge-from-fact/?print=true
Page 3 of 3