Richard GonzalezRichard Gonzalez

Center Director, Research Center for Group Dynamics, Institute for Social Research
Director, BioSocial Methods Collaborative, RCGD
Amos N Tversky Collegiate Professor, Psychology and Statistics, LSA
Professor of Marketing, Stephen M Ross School of Business
Professor of Integrative Systems and Design, College of Engineering


E-mail: Email Richard Gonzalez
Address: Research Center for Group Dynamics
Institute for Social Research
University of Michigan
426 Thompson Street
Ann Arbor, Michigan 48106
Phone: 734-647-6785

The important role of replication in research

Sep 8, 2012 | Psychology, Statistics/Methods

This paper is gaining some new interest given the recent attention the field of social psychology is giving to the issue of replication. When we wrote this paper the field was debating the use of null hypothesis testing. We argued that replication needs to be emphasized as well.  But this wasn’t new to Fisher who wrote:

To demonstrate that a natural phenomenon is experimentally demonstrable, we need, not an isolated record, but a reliable method of procedure. In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment [that] will rarely fail to give us a statistically significant result. (Fisher, 1951, p. 14)

Greenwald, A. G., Gonzalez, R., Harris, R. J., & Guthrie, D. (1996). Effect sizes and p-values: What should be reported and what should be replicated? Psychophysiology, 33, 175-183. \doi{10.1111/j.1469-8986.1996.tb02121.x} PMid:8851245 PDF


Despite publication of many well-argued critiques of null hypothesis testing (NHT), behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics’ catalogs of NHT’s flaws, this article also takes the unusual stance of identifying virtues that may explain why NHT continues to be so extensively used. These virtues include providing results in the form of a dichotomous (yes/no) hypothesis evaluation and providing an index (p value) that has a justifiable mapping onto confidence in repeatability of a null hypothesis rejection. The most-criticized flaws of NHT can be avoided when the importance of a hypothesis, rather than the p value of its test, is used to determine that a finding is worthy of report, and when p approximately equal to .05 is treated as insufficient basis for confidence in the replicability of an isolated non-null finding. Together with many recent critics of NHT, we also urge reporting of important hypothesis tests in enough descriptive detail to permit secondary uses such as meta-analysis.