Case selection is a crucial component of qualitative or mixed-methods research design. Prescriptions for case-selection techniques abound and often give contradictory advice. For example, researchers sometimes are advised to select on the dependent variable and encouraged at other times to select on the independent variable (Gerring Reference Gerring2007a, 101–104). Researchers are told not to select cases simply because they find them interesting (George and Bennett Reference George and Bennett2005, 83) but they should be able to explain substantively significant events (Goertz and Mahoney Reference Goertz and Mahoney2012, 184–5). The variety of case-selection techniques, however, reflects the many goals that researchers may want to accomplish with case studies. A case study, for instance, could be used to confirm a statistical relationship, identify a causal mechanism, or discover new causal conditions to test. Perhaps there are so many types of case-selection techniques because there are so many research goals that practitioners can achieve with case studies.
However, does the practice of case selection fit the prescription? This article addresses whether case-study practitioners adopt the case-selection practices advised by methodologists. We conducted a meta-analysis of a sample of peer-reviewed journal articles that use case studies as either a qualitative or a mixed-methods research design. We uncovered several findings: some case-selection strategies are used frequently whereas others are comparatively underutilized; practitioners often combine techniques in an ad hoc manner; and researchers rarely identify the population from which they select their cases.
We also propose an additional set of considerations: the logistical constraints that researchers face when conducting case-study research, such as limitations on funding, language skills, local networks, and access to government documents and officials. These limitations may accrue disproportionately to scholars based on gender, race, ethnicity, or career tenure, and they likely affect case selection because researchers choose cases based on whether they perceive research to be possible. We argue that methodological rigor in case selection overlooks the human element in social science research, thereby diminishing transparency. A mindful discussion of research goals, weighed against case-selection constraints, ultimately will clarify the considerations that went into case selection, encouraging greater openness and transparency on the part of researchers. We do not suggest that logistical constraints should outweigh methodological rigor but rather that fully addressing the former can complement the latter. Revealing these constraints can reveal a universe of logistically feasible cases, and case selection should be discussed within this context. We propose that researchers be more transparent about research goals and case selection, acknowledging the influence of practical considerations and the tradeoffs they require.
CASE-SELECTION STRATEGIES
In qualitative and mixed-methods approaches, case selection is thought to mitigate many methodological problems related to confounders and generalization. In the mixed-methods approach specifically, case selection is the crucial link between the distinct methodological approaches (Goertz Reference Goertz2017) and—for the qualitative component of the mixed-methods project—often is the only direction given on how to proceed. In both fully qualitative and mixed-methods studies—when case studies comprise the qualitative component—there are many case-selection strategies. As a heuristic, we outline a typology of case-selection types: characteristics of the case, relationship between a small number of other cases, relationship to the posited X/Y association, and relationship to a large-N sample. Embedded within these four types are multiple subtypes.
As a heuristic, we outline a typology of case-selection types: characteristics of the case, relationship between a small number of other cases, relationship to the posited X/Y association, and relationship to a large-N sample.
First, a case can be selected based on its inherent characteristics. It could be chosen because it exhibits a high score on the independent variable of interest, which should make the causal mechanism easier to observe (Seawright Reference Seawright2016, 85–97), or for having an extreme score on the dependent variable (Gerring Reference Gerring2007a, 101–104; Rogowski Reference Rogowski, Brady and Collier2010, 91–6). It also could be chosen because it is substantively significant (Beach and Pedersen Reference Beach and Pedersen2013, 145; Goertz and Mahoney Reference Goertz and Mahoney2012, 184–5). Single case studies selected for their characteristics can be difficult from which to generalize, although this is less of a problem if the researcher’s intent is explanation, not generalization.
Second, a small number of cases can be selected based on their relationship to one another. This strategy follows the Millean methods of difference and agreement, also called most- and least-similar designs, respectively (George and Bennett Reference George and Bennett2005, 50–51; Goertz and Mahoney Reference Goertz and Mahoney2012, 195). Footnote 1 Least-similar cases are those that are different in every way except the explanatory variable and outcome, thus isolating the causal condition. Most-similar cases are similar in every way except the explanatory variable; they should differ on the outcome, ostensibly controlling for confounders. The potential drawback of these selection techniques is that it may be difficult to find cases that fit these strict criteria.
Third, cases can be selected based on their connection to the posited X/Y relationship. This takes the form of most-likely, least-likely, deviant, typological, and crucial cases (Beach and Pedersen Reference Beach and Pedersen2013; George and Bennett Reference George and Bennett2005; Goertz and Mahoney Reference Goertz and Mahoney2012; Seawright Reference Seawright2016; Seawright and Gerring Reference Seawright and Gerring2008). A most-likely case is one that should conform to the theory—an easy test in which passing does not confirm the theory but failing could disconfirm it. This technique often is used to test—and eliminate—rival hypotheses. A least-likely case is one that should not conform to the theory. Failing the test does not necessarily disconfirm the theory but passing provides strong support for it. Footnote 2 Deviant cases are those that diverge from the expected outcome. Typological cases represent types of a conceptual or causal typology. Crucial cases are doubly decisive tests, in which passing provides strong support for the theory but failing strongly impugns it. This method was underscored by Eckstein (Reference Eckstein, Greenstein and Polsby1975) but its viability was questioned by Gerring (Reference Gerring2007b), who noted that a single case should not be used to build or dismiss generalizable theories. Additionally, researchers can choose cases based on a theoretical sampling method (Eisenhardt Reference Eisenhardt1989), in which cases are selected because they are underexplored types of a phenomenon.
Fourth, cases can be selected based on their relationship to the results of the statistical model. Typical and deviant cases are those that are on- and off-line (Lieberman Reference Lieberman2005), and Seawright and Gerring (Reference Seawright and Gerring2008) identified diverse, extreme, influential, most-similar, and most-different cases using statistical techniques. Fearon and Laitin (Reference Fearon, Laitin, Brady, Box-Steffensmeier and Collier2008) argued that cases should be chosen randomly, stratified on some variable of interest or control variable. For a qualitative or mixed-methods researcher, there are many case-selection strategies from which to choose.
DATA AND METHODS
We compiled a sample of articles from several journals: American Journal of Political Science, American Political Science Review, Comparative Political Studies, Comparative Politics, International Organization, Journal of Politics, Political Research Quarterly, and Studies in Comparative International Development. These were chosen to represent a range of journals that publish comparative and international case-based research. We conducted full-text searches for the keywords “mixed methods,” “multi-methods,” “case selection,” “case study,” “nested analysis,” and “triangulation” from 2010 to 2015, discarding articles that did not contain at least one qualitative case study. The search returned 79 articles. The articles were then coded based on this typology (see table 1). Footnote 3 Whenever possible, the author’s descriptions of the case-selection technique were used to code them. In cases of ambiguity, the strategy was discerned with reference to this typology. The following section describes the data.
Table 1 Meta-Analysis of Journal Articles
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20171011143332-55324-mediumThumb-S1049096517001214_tab1.jpg?pub-status=live)
Note: *These percentages are based on a total count of how often the technique was used. Case-selection strategies often are combined; therefore, one case could have several techniques attached to it.
FINDINGS
We have three main findings. First, the dominant case-selection strategy is the most-similar cases, or Millean method of difference. The justification often given is that confounders can be controlled for with this research design. However, the frequent use of most-similar research designs suggests another, unarticulated factor of case selection. That is, researchers tend to have familiarity with similar cases within a region and thus focus on explaining instances in which these similar cases have varying outcomes. Although the case-selection procedure fits the research goal, the goal is driven by logistical considerations. This suggests that other case-selection procedures are underutilized not so much because they represent undesirable research goals but rather because of practical constraints.
Additionally, in a mixed-methods project, the dominant approach is to select cases that maximize a range of values on either X, Y, or X/Y. This approach puts a high degree of faith in the prior statistical analysis but accomplishes little as a robustness check. This suggests that mixed-methods researchers are not using triangulating (Denzin Reference Denzin1978; Jick Reference Jick1979; Morse Reference Morse1991; Tarrow Reference Tarrow1995) or integrative mixed-methods approaches (Seawright Reference Seawright2016) but rather what Greene, Caracelli and Graham (1989, 258) referred to as complementarity—in which case studies are not used as a check on the quantitative analysis but rather as a way to explore additional aspects of the posited causal relationship. Combined with the fact that there are multiple underutilized mixed-methods case-selection techniques, this indicates that the many goals that can be pursued with mixed-methods research are underexplored. Scholars should be more transparent about their specific research goals in order to explain the rationale for their case selection.
Second, researchers often combine strategies in an ad hoc manner. The literature on case selection does not specifically address combining case-selection techniques but instead treats each technique separately. However, researchers combined two or more strategies in 48% of the articles in our sample. This is another example of theory diverging from practice but one in which theorists should follow the lead of practitioners. There should be a sustained discussion among methodologists about the appropriate way to combine various case-selection techniques as well as the advantages and drawbacks of various combinations.
Third, we found that researchers almost always explained why they chose their cases but rarely identified the population of relevant cases from which they were drawing. In other words, there was little discussion of cases that could have been but were not chosen. We suggest that this is where logistical considerations, over methodological concerns, come into play. Logistical considerations refer to factors such as language skills, availability of funding, and in-country networks. This is the organic, human element of research that often is not discussed; however, it has real implications for the constraints that researchers face and the choices they make.
THE PATHOLOGY OF TRANSPARENCY AND THE THRESHOLD RULE
In his discussion of bureaucratic pathology, Merton (Reference Merton1940) argued that bureaucratization necessitates the identification of organizational goals and the establishment of benchmarks to measure whether those goals are met. However, organizational goals often represent values that are difficult to measure; therefore, benchmarks are only proxies. Over time, reaching these performance benchmarks not only replaces the original goal but also may undermine it. For example, the goal of the American legal system is justice, but justice is difficult to measure. One way that the performance of prosecutors is measured is by conviction rates. High conviction rates speak well for the prosecutor but may not represent whether the prosecutor is achieving “justice.” In fact, focusing purely on conviction rates and not on whether the outcome of the case is fitting can undermine justice. According to Merton, this is a pathology of bureaucracy.
The underlying logic of this pathology also may be present in the recent push for transparency—located specifically in the fact that transparency is similarly difficult to measure.
The underlying logic of this pathology also may be present in the recent push for transparency—located specifically in the fact that transparency is similarly difficult to measure. One performance benchmark that is likely to result is methodological rigor, or adherence to methodological dicta. Methodological rigor, however, cannot incorporate the human element of research. The result is that researchers may be hesitant to discuss the logistical reasons for their selected cases, thereby undermining transparency.
The previous section highlights instances of divergence from theories of case selection to the practice of case selection. However, we note that there is an additional area of divergence not easily found in bibliometric analysis. Sometimes researchers select cases based on logistics—that is, based on their language skills, familiarity with the region, and in-country networks. This aspect of case selection is rarely discussed—except perhaps to disqualify it as a valid means of selecting case studies—although, in practice, logistical concerns can be at least tantamount to methodological concerns. This is not to suggest legerdemain on behalf of researchers but rather to argue that in the interest of transparency, we should be more open to discussing all considerations related to case selection.
Researchers face a wide array of logistical constraints especially when conducting field research but also when planning their research before entering the field—which inevitably impacts their case selection. These constraints may disproportionately affect women and minorities because they tend to be underrepresented among tenured faculty and have less access to funding. Therefore, these groups of researchers face particularly rigorous constraints regarding their budgets and field-research plans. At the same time, methodological rigor is determined by those groups most strongly represented in the field, who have more resources at their disposal. We contend that rather than attempt to conceal the practical considerations that often constrain case selection and qualitative research, we should acknowledge and incorporate human constraints into our methodological prescriptions and conceptual frameworks. After all, they reflect the realities of qualitative research in the field—as well as the properties and hierarchies within the academic field as such—and therefore should be considered and discussed at the research-design stage.
We propose the following threshold rule: researchers should state their research goals and delineate the criteria that, given those goals, constitute a “good” case. Footnote 4 They then should identify cases based on these criteria. This list need not be exhaustive because it can be difficult to determine before extensive research. Of the set of “good” cases, any cases with membership in that set should be considered appropriate for selection. We also suggest the following caveat: if the researcher’s goal is generalization and if a case was used to inductively develop a theory or model, then that case should not be selected for a formal case study. However, if a researcher’s goal is explanation of a particular event, even this caveat can be relaxed.
CONCLUSION
We conducted a meta-analysis of journal articles to find whether theory meets practice in case selection. We found that the dominant case-selection technique is the most-similar research design, that researchers combine case-selection techniques ad hoc, and that the population of cases being drawn from is rarely identified.
We draw from these findings three conclusions. First, case-study practitioners should be more explicit and expansive about their intended research goals of the case study and choose case-selection techniques accordingly. This, in turn, also could be informative for other researchers and build methodological sophistication. Second, methodologists should address the issue of combining case-selection techniques to establish best practices. Third, we suggest that researchers should be more explicit about the population of relevant cases from which they are drawing particular cases and acknowledge the logistical considerations that may have been involved in that selection. We propose a threshold rule as a way of achieving methodological rigor while also acknowledging the human constraints of research that may accrue disproportionately across the discipline.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit https://doi.org/10.1017/S1049096517001214