Introduction
Once the hallmark of the experimental approach (e.g., Cox and Reid, Reference Cox and Reid2000), ‘design’ has come to refer to any research planning deployed to ensure sound results (Creswell and Creswell, Reference Creswell and Creswell2017). In political science, the term gained traction after King et al. (Reference King, Keohane and Verba1994) proposed a template of reference for comparative politics and international relations with the explicit motivation to fill a gap. At the time, they found that young scholars were ‘highly sophisticated qualitative and quantitative data collectors, interviewers, soakers and poakers, theorists, philosophers, formal modelers, and advanced statistical analysts,’ yet often ‘had trouble defining a research question and designing the empirical research to answer it.’ On top of that, students ‘proposed impossible fieldwork to answer unanswerable questions’ (King et al., Reference King, Keohane, Verba, Brady and Collier2004: 112). Their proposal conveyed the image of a discipline to which causation was the driving concern, experiments provided the standard of reference, and the statistical treatment of large sets of data offered the viable approximation worth pursuing.
This strategy for methodological consolidation is far from uncommon in the social sciences and seldom goes unchallenged. A few decades before, the experimental design had been set as the benchmark for program evaluation (Campbell and Stanley, Reference Campbell and Stanley1963; Cook and Campbell, Reference Cook and Campbell1979). In response, the unsympathetic methodological quarters entrenched beyond paradigmatic lines and causal analysis became a disputable practice (e.g., Guba and Lincoln, Reference Guba and Lincoln1989). Political science tells a different story. Despite heterogeneity (e.g., Mahoney and Goertz, Reference Mahoney and Goertz2006; Della Porta and Keating, Reference Della Porta, Keating, Della Porta and Keating2008), the fields of comparative politics and international relations aimed for common ground. Later influential contributions agreed – with some reasonable caveat – that credible causal inference provides a common motivation in political science and bridges research across techniques and traditions (e.g., Gerring, Reference Gerring2001; Collier et al., Reference Collier, Seawright, Munck, Brady and Collier2004: 57 ff; Ragin, Reference Ragin2008; Rohlfing, Reference Rohlfing2012; Blatter and Haverland, Reference Blatter and Haverland2012; Bennett, Reference Bennett, Bennett and Checkel2015; Fairfield and Charman, Reference Fairfield and Charman2017; also Levy, Reference Levy2007). A major consequence of such focused pluralism lies in the emerging practices of mixing (e.g., Brannen, Reference Brannen2005, Onwuegbuzie et al., Reference Onwuegbuzie, Johnson and Collins2009), nesting (e.g., Lieberman, Reference Lieberman2005, Rohlfing, Reference Rohlfing2008), and triangulating (e.g., Hammersley, Reference Hammersley and Bergman2008) techniques from different traditions into the same study. Meanwhile, the call for replicability and transparency has enhanced the relevance of design choices to establish the credibility of research findings (King, Reference King1995; Blair et al., Reference Blair, Cooper, Coppock and Humphreys2019).
Since then, the methodological discourse has been compelled to provide meaningful guidance on conducting sound causal research and preserve the dialogue across studies. This special issue follows from the seminar that, with this aim, the Standing Group in Research Method in Political Science – MetRiSP held in 2019 at the University of Catania under the auspices of the Italian Society of Political Science – SISP. The seminar shared the tenet that, on pain of demoting research to homiletics, credible explanations of political and social phenomena entail the empirical testing of causal claims (i.e., Ermakoff, Reference Ermakoff2019) – although, in turn, testing produces a new crop of problems. Some of them have a technical nature and depend on the toolbox of choice: frequentists are concerned by statistical power and significance; Bayesians, by decisions about priors; set algebraists, by calibration settings. Other problems are common conceptual issues – such as deciding how to define the phenomena of interest or which manifestations capture them without stretching the definition. In between, the problem lies in establishing the causal role of a factor. Here, design enters as the default warrant of credible ascription.
Research design is the rationale for connecting evidence to a causal model to establish its tenability – the major difference lying in model specification. Designs, therefore, warrant the pluralist desiderata of defining traditions in causal research while avoiding estrangement, to ease the cumulation of substantive causal knowledge.
How political science understands research design
Across the discipline, the methodological discourse offers four major definitions of design, as follows.
Design as a stage in the research cycle
Schmitter (Reference Schmitter, Della Porta and Keating2008: 294) makes sense of research design as the solution to an implementation problem. He portrays the research process as a cycle of strategic decisions, clustered into 11 themes and four stages. The stage of ‘discovery’ entails decisions on the topic, its conceptualization, and the generation of a hypothesis as an ‘if-then’ conditional. The hypothesis initiates the second stage of ‘explication,’ where decisions are made on selecting cases and operationalizing variables – and the proposal is written. The operationalization opens the third stage of ‘accuracy,’ in which measures are decided and tested as needed. To Schmitter, this also is the time for serendipitous reconsiderations, as measurement may retrofit concepts. The last stage, of ‘proof,’ provides the technique of choice with the empirical basis for establishing the tenability of the ‘if-then’ claim.
The issue of design arises during explication as the question of which observations provide a proper source of evidence (ivi, 276). The answer is laid on the degree of control that the researcher can exert on the variation of key factors. Given fully manipulable variables, the selection can pursue a pure experimental design. Without manipulability, but given the possibility of simulating it, the selection should embark on quasi-experiments. Absent that possibility, those researchers affording reliable cross-unit observations can opt for comparisons and case studies. Given no empirical basis, the researcher can still resort to thought experiments – either rhetorical, or counterfactual as in the analytical narrative, the spatial theory of voting, or agent-based modeling approaches, just to mention three possibilities (Austen-Smith and Banks, Reference Austen-Smith and Banks2005; McCarty and Meirowitz, Reference McCarty and Meirowitz2007; Martelli, Reference Martelli2009; Laver and Sergenti, Reference Laver and Sergenti2011).
Design as a defining feature of empirical studies
Toshkov (Reference Toshkov2016) engages in arranging many renowned distinctions in the methodological literature within a single taxonomy of research. He identifies ‘descriptive,’ ‘predictive,’ and ‘explanatory’ as the three genera within the family of empirical research that, with the theoretical family, compose the ‘positive’ research order as distinct from the ‘normative’ one. Then, he introduces further distinctions between species of positive studies: experimental and observational, statistical large-N and comparative small-N, cross-case and within-case.
Like in Schmitter's portrayal, positive studies are the branch in the taxonomy where the actual issue of designing arises (ivi, 168). In a partial departure from the previous proposal, however, here the relevant diversity depends on both the case selection and the manipulability of the quantity of interest, as independent design choices set the two. Case selection is understood as the possibility of random assignment of the units of observations to a state of the variable of interest, independent of the possibility of experimental control over that state.
Thus, the ‘gold standard’ of randomized control trials results when both random assignment and experimental control are employed in the same design. Experimental control alone allows for quasi-experiments, while random assignment alone makes natural experiments possible. When no control is applied, the study becomes observational.
Design as an approach to causal inquiry
The third perspective comes from Gschwend and Schimmelfennig (Reference Gschwend, Schimmelfennig, Gschwend and Schimmelfennig2007), who connect designs to types of causal inference.
Their typology makes designs follow from the number of observations, opposing large-N vs. small-N types of studies. Variation in numerosity indicates alternative renderings of causation – deterministic or probabilistic – and of challenging observations – interesting ‘deviant cases’ or noisy ‘outliers’ (ivi, 13). The second dimension contrasts ‘factor-centric’ and ‘outcome-centric’ designs. The authors borrow the distinction from George and Bennett (Reference George and Bennett2004) and the case study tradition but assume its general applicability (indeed: Holland, Reference Holland1986). The distinction points to two kinds of questions to which research can respond – namely, whether a single factor of interest can be proven relevant to an outcome, or which bundle of factors account for the outcome of interest. They consider that picking one question has practical consequences on design. Factor-centered research pursues the insulation of the relationship of interest to magnify its signal against others that may interfere with it. Instead, outcome-centered research aims to identify the complex of the relevant relationships to some actual outcome and widens to selected interplays. Thus, they reason, the first strategy establishes the predicting capacity of a factor; the latter, its explanatory value.
The combination of the two dimensions – numerosity and research question – yields four types of designs (ivi, 14). Pure or natural experiments, and studies affording statistical control, illustrate the factor-centric large-N type, while cross-case comparisons and quasi-experiments are factor-centric small-N. On the side of the outcome-centric types, forecasting and Qualitative Comparative Analysis are examples of the large-N sort, while case studies identify the small-N ones.
Design as a diagnostic device
The proposal by Blair et al. (Reference Blair, Cooper, Coppock and Humphreys2019) responds to the problem of how to ‘declare’ one's research design so that the information conveys the merits of the researcher's choice against alternative specifications.
The framework serves the purpose of easing peer scrutiny and, retrospectively, guiding design decisions. Their examples show that the scheme applies to descriptive designs (e.g., surveys, Bayesian inference), causal designs (Process Tracing, Qualitative Comparative Analysis, Nested Analysis with qualitative confirmation, Matching on Observables, Regression Discontinuity, and experimental), and discovery-oriented designs (latent allocation). However, the authors acknowledge that a ‘design declaration’ may become complete at different stages of the research process depending on the motivation of the study. They reason that early and complete declarations follow from testing hypotheses about single factors on secondary data. When inference serves discovery, the details of a design may become fully clear ex-post only; thus, they may not allow alternative specifications unless committing to a hypothesis.
Following Geddes (Reference Geddes1990), they ponder that a complete design consists of four elements: a causal model of the world (${\cal M}$), an inquiry (${\cal I}$
), a data strategy (${\cal D}$
), and an answer strategy (${\cal A}$
). ${\cal M}$
includes the factors considered for the analysis and the assumptions about the shape and direction of the relationships they entertain. In short, ${\cal M}$
is the data generation model to capture a pattern in the real world. ${\cal I}$
details that which the researcher wants to learn about ${\rm \;{\cal M}}$
, reduced to two options: the conditional values of a special factor Y, or the values that the factor would take under intervention. ${\cal D}$
refers to the strategies employed to construe evidence – such as data collection, sampling, assignment, casing, and the mapping of latent variables onto observable ones. The answer ${\cal A}$
is declared through the techniques deployed to turn data from ${\cal D}$
into evidence responding to ${\cal I}$
.
These elements, and their content, convey a broader understanding of the research design as the set of decisions made along the whole research process: ${\cal M}$ captures theory formulation, ${\cal I}$
corresponds to the research question, ${\cal D}$
indicates the criteria for case selection and coding, while ${\cal A}$
is for the actual strategy for inference (ivi, 842). Although analytically discrete, the authors reason that these elements are connected by a relationship of dependence: the data strategy ${\cal D}$
and the answering strategy ${\cal A}$
follow from the interplay between a model ${\cal M}$
and an inquiry ${\cal I}$
(ivi, 852). The priority of ${\cal M}$
and ${\cal I}$
, the authors reason, holds even in discovery-oriented studies. Despite the fact that it may proceed from a less structured starting point, no empirical strategy can venture into the field without some sense-making device.
The hallmark of the pluralist approach
Although far from overlapping, the four definitions just discussed agree on the tenet that causal claims can be established credibly despite the fact that our ascription can be biased. Research design is for keeping our biases under control while gathering the evidence that decides the claims' tenability. The legitimate causal claims, importantly, may narrow on single causal factors or widen to causal structures.
The merit of the latter point becomes especially clear against the ongoing debate on the limits of the methodological consolidation project. The point of contention concerns the core belief that manipulation and random assignment provide some superior causal evidence as they warrant the independence of conclusions on theoretical assumptions about causal structures (e.g., Imbens, Reference Imbens2020). The limit consists of reducing the data generation model to some minimal stimulus-response mechanism, and causal analysis to an estimate of the difference in the aggregate response of the units exposed to the stimulus and their unexposed statistical twins. This strategy may provide a first answer to whether the stimulus works; however, it dismisses the related questions of how it happens and under which conditions it succeeds or fails as matters of disturbance of an information signal. These questions are far from ancillary. Even when understood as the simplest elicited response, success or failure always is a local matter of the stimulus meeting the right conditions or no obstructions (e.g., Pemberton and Cartwright, Reference Pemberton and Cartwright2014, Kaaber, Reference Kaaber2020). At the same time, the ascription of the width of the response to the stimulus remains credible unless confounders are ruled out. When such relevant conditions and confounders are not explicitly considered, they may bias the estimate of the stimulus's net effect in unknown directions (e.g., Pearl, Reference Pearl2009).
Moreover, the four definitions discussed above show that the pluralist stance can place political science beyond the debate. They implicitly acknowledge that causal analysis always requires models as the selection of meaningful connections in a tangled world. Besides, relevant background conditions and structural assumptions enter either as controls of the estimate of the single relationship of interest or as compound factors before an outcome (e.g., Franzese, Reference Franzese, Boix and Stokes2007, Ragin, Reference Ragin2008, Bennett, Reference Bennett, Bennett and Checkel2015). Designs warrant the tenability of any causal model. An ‘if-then’ relationship hardly holds unless some ‘if-else’ counterfactual evidence is available that, had the factor been different under the circumstances, the response would have changed, too. Design tackles the problem of finding or construing the proper ‘if-then-else’ empirics to dispel the doubt that the relationship only lives in our imagination.
Along this line, in this issue, Valentim et al. (Reference Valentim, Ruipérez Núñez and Dinas2021) offers a blueprint of the quasi-experimental design based on the discontinuity in the value of a factor. The strategy assumes that the units close to the threshold approximate experimental twins and allow estimating the average treatment effect of the factor, albeit with local validity. Valentim et al. (Reference Valentim, Ruipérez Núñez and Dinas2021) shows how theory is still required to establish the plausible functional form of the effect and generalize the findings.
In a similar vein, Costalli and Negri (Reference Costalli and Negri2021) introduce matching techniques to identify those units in the field that explicitly make the counterfactual claim observable. They discuss how these techniques find the twins in the units that display similar selected background features and how such a similarity can be estimated with different gauges. In a partial departure from estimates that rely on mapping the manifold background features onto a unidimensional propensity score, they illustrate the so-called coarse matching in which the relevant background features are turned into meaningful classes. Here again, theory is implicitly required to establish which background features are relevant and how classes are meaningful.
Di Salvatore and Ruggeri (Reference Di Salvatore and Ruggeri2021) make the role of theory and models explicit in addressing a special source of threat to the credible estimate of the net effect, usually discounted through the error term. Against the standard assumption that the response follows from the direct exposition of a unit to the causal factor of interest, they contend that, in real settings, causation can also proceed indirectly from exposed units to their neighbors through ‘transfer’ mechanisms such as spillover or mimesis. Hence, they show how these spatial effects can be modeled and gauged to improve causal estimates.
Martini and Olmastroni (Reference Martini and Olmastroni2021) go further in the direction of model specification as theory testing. They discuss the application of the experimental rationale to surveys, to test theories about people's preferences. They show how, in factorial and conjoint designs, an eliciting factor can be modeled as a special configuration of key features expected to interact with special traits of the respondents, which they recognize as moderating factors. The random assignment of actual respondents to the eliciting factor's components allows testing the hypothesis while freeing responses from social desirability bias.
Damonte (Reference Damonte2021) again applies the configurational understanding of the causal factor in a quasi-experimental design, although under the assumption that the causal factor is the whole compound of right conditions. The causal analysis is geared toward pinpointing the bundle of theoretically meaningful factors that support an ‘if-then-else’ claim about an outcome beyond the rationale of the net effect. She shows set-theoretical techniques and logical pruning operations can identify the relevant compounds beneath each state of the outcome, and that the compounds retrieved under alternative counterfactual assumptions can constitute mediated causal structures.
Ruffa and Evangelista (Reference Ruffa and Evangelista2021) discuss the process-tracing approach to the data-generation model as a theory-driven definition of the causal situation and a hypothesis about the special unfolding of a chain of events before an outcome. This chain turns hypothesis testing into the retrieval of the marks or fingerprints that the chain of events would have left in the cases, where the hypothesis true. They recognize the challenge of the technique in identifying meaningful marks, establishing the evidential value that each can bear, and converting that value into a weight for the Bayesian update of our beliefs in favor or against the tenability of the hypothesis in the case against its alternatives.
Together, the articles attest that the ‘if-then-else’ understanding of causation embodied in the quasi-experimental rationale affords plural renderings, each suitable to respond to specific questions. The contributions, moreover, suggest dismissing the debate between the primacy of model vs. design as the one cannot really dispense from the other. The ultimate rift in research may run less between techniques, languages, or questions than between quarters – those rejecting causation as a legitimate and fruitful object of the discipline and those embracing the challenge of better causal knowledge.
Funding
This research received no specific grant from any public or private funding agency.
Conflict of interest
None.