In recent years, experiments—in laboratories, but especially in the field—have come into their own in political science and developmental economics, accompanied, often, by the argument that these methods and the evidence they produce are superior to others or, in the increasingly tired metaphor, that they constitute a “gold standard” to which all researchers should aspire. The implication is that researchers using other methods are doing “second-best” research and that funders and policymakers should not waste their money or attention on studies conducted using “nonexperimental” methods. If accepted, such arguments directly contradict the value of methodological pluralism embraced by many political scientists post-Perestroika. This volume, edited by Dawn Langan Teele, speaks to whether field experiments themselves meet the “gold standard” and whether dominance by one method is preferable to methodological diversity.
As Teele writes in the preface, the book grew out of a debate she organized at Yale in 2009 in which Donald Green defended field experiments against criticisms by political scientists Susan Stoker and Ian Shapiro and development economist Angus Deaton. Teele represents the debate as one about the comparative value of experimental and “observational” inquiry in producing the best evidence for identification of causes in the social world. (In this context, “observational inquiry” means application of statistical or econometric methods to quantitative indicators or data sets, and this is its meaning here unless otherwise noted. This usage comes out of the research practices of epidemiology and public health; the work of statistician Paul Rosenbaum [Observational Studies, 2002] is the cited source for it.) The virtue of the book is that it gathers in one volume the arguments of some of the strongest proponents of field experiments in political science and development economics, alongside respectful assessments of this trend as well as some pointed critiques of it. Its Achilles’ heel is that only three of the chapters are original contributions, making it difficult to capture in print the responsiveness of a debate when six chapters are reprints of already published work.
Why field experiments? The conventional wisdom is that laboratory experiments are strong on the criterion of internal validity (the identification of causal effects, i.e., whether and to what extent manipulation of the independent variable, the “treatment,” produces the dependent variable, the effect) but weak on external validity (generalizability of findings to settings outside the lab). In contrast, observational methods are weak on internal validity (due to the inability to manipulate the independent variable and measure its average effects using equivalent experimental and control groups) but strong on external validity (since the observations are measures of the world). It follows, in this view, that field experiments are strong on both criteria, employing the power of random assignment to produce equivalent control and experimental groups for the identification of causes, yet more generalizable because “the field” is not artificial like “the lab.”
Field Experiments and Their Critics opens with an example from political science, a reprint of Alan Gerber, Donald Green, and Edward Kaplan’s 2004 “The Illusion of Learning from Observational Research,” in which they offer a Research Allocation Theorem for deciding when to conduct experimental versus observational research. As the title implies, they argue, based on their theorem, that experiments are usually superior but are, also, a standard in and of themselves: “The test of whether methodological inquiry succeeds is its ability to correctly anticipate experimental results because experiments produce unbiased estimates regardless of whether the confounders [i.e., potential control variables] are known or unknown” (p. 25). Methodologists of other stripes might well question this circular logic and point to understanding or prediction of substantive political events as a better test.
The second chapter, by Susan Stokes, is “A Defense of Observational Research.” Although the problem of “omitted variable bias” is well recognized in the statistical approaches used in observational research, Stokes argues that the position that only random assignment controls for both known and unknown factors is a position of “radical skepticism” that caricatures observational researchers as able to do “nothing more than ‘assume nonconfoundedness’” (p. 38). One difficulty she identifies in the argument of Gerber, Green, and Kaplan is that claims about methods in the abstract often look quite different when substantive issues are brought into play. In observational research, there is often a limited number of alternative explanations (of the outcome of interest), many of which can be eliminated using substantive knowledge and theory. Stokes appears to have persuaded economists Christopher Barrett and Michael Carter, who conclude in the next chapter, “A Retreat From Radical Skepticism,” that development economists should “more creatively balance observation and experimental data” (p. 72). Describing weaknesses of field experiments—from ethical concerns to the ways in which participants may actively resist random assignment to control or experimental groups—they write: “[L]imits to RCTs [randomized controlled trials] … by themselves mandate a return to methodological pluralism if we are [to] continue to answer the important questions” (p. 74).
Chapter 4 is a reprint of Abhijit Banerjee and Esther Duflo’s 2009 “Experiments in Development,” which recounts the successes of field experiments in Africa, Mexico, and India. Teele includes development economics in the book because of the “experimental revolution” in that field (p. 5), which has garnered much academic and public attention, specifically through the work of this duo who wrote the 2011 prize-winning book Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty. Although their essay responds to some “concerns” about experiments—randomization bias and compliance issues, among others—they do not believe these problems are unique to field experiments (p. 78). They end the essay by opining that economists’ insights “should guide policy making” and even “midwife the process of policy discovery” (p. 113).
Teele’s own chapter, up next, examines the ethics of field experiments. Unlike other methods, field experiments are akin to “social engineering,” in her view, because they require not only observations of people’s daily lives but also purposeful interventions that can alter individuals’ “life chances” and even undermine the “social fabric” of communities (pp. 135, 115, 129). Examining the work of Banerjee and Duflo and others in developmental economics, Teele argues that too many field experiments have violated the principles of the Belmont Report (1978), the foundational document in medical ethics referenced by U.S. institutional review boards.
Chapter 6, another reprint, is Angus Deaton’s detailed critique of field experiments as a primary method for understanding economic development. He is skeptical that field experiments as actually implemented are superior to the observational methods of econometrics. Although parts of his analysis are specific to technical debates in development economics, what is most germane to the book’s larger themes is that Deaton sees a pernicious effect from the dominance of field experiments: a focus on the “what” to the neglect of the “why,” which undermines the explanatory value of development economics. Like Stokes, Deaton is skeptical of the general proposition that RCTs “automatically trump other evidence” (p. 143), averring that there is “no substitute for careful evaluation of the chain of evidence and reasoning by people who have the experience and expertise in the field” (pp. 179–80). Another vote for moderation comes in Chapter 7 from Andrew Gelman, who, although he repeats the gold standard metaphor, finds space for such methods as historical and qualitative approaches and experimental approaches besides RCTs.
While its title appears on point, “Misunderstandings Between Experimentalists and Observationalists About Causal Inference,” the penultimate chapter, a reprint of Kosuke Imai, Gary King, and Elizabeth Stuart’s 2008 article in the Journal of the Royal Statistical Society (vol 171: 2, pp. 481–502), is written for a technically proficient audience. As with the authors of the beginning chapter, these scholars put their energy into a “general framework for understanding causal inference” (p. 196) illustrated through the comparison of two studies on the survival of women with breast cancer. Although the general framework may correct “misunderstandings,” it is not clear whether other substantive disagreements between the two communities remain.
The final chapter, by Ian Shapiro, echoes the views of Deaton in its criticism of the disciplinary effects of too much emphasis on field experiments. Shapiro criticizes the effort to mimic practices from medicine, emphasizes the kinds of questions of interest to political scientists that are not amenable to investigation via field experiments, and worries that research agendas focused on the latter method mean that “scholars will be learning more and more about less and less” (p. 233). In a way, the volume itself is testimony to his concerns as there is relatively little attention to substantive politics of, for example, breast cancer research. Banerjee and Duflo write—naively, in my view—that the nonparticipation that can undermine field experiment logic will become less common in developing countries as randomized evaluation of development programs comes “to be recommended by most donors” (p. 97), ignoring the politics of international aid that scholars of international relations (nowhere mentioned) point to as important for understanding development.
Shapiro is now well known for his statement that became the title for his chapter in this volume: “Methods Are Like People: If You Focus on What They Can’t Do, You Will Always Be Disappointed.” Turning that part of his title around and focusing on what methods can do, those endorsing methodological pluralism will recognize that field experiments are good for some research questions—identifying and providing precise evidence for particular “treatment effects” of interest to policymakers. But consistent with this position, other methods have their own strengths, and it would have been nice to see some recognition of that in this volume. Contemplating the entirety of Field Experiments and Their Critics, many readers will likely demure from Teele’s assessment that “the jury is still out as to whether experiments are the only way or the best way to tell us all we need to know about a policy intervention” (p. 116). Her volume provides the arguments to send that jury home.