Qualitative research is plagued by two unresolved debates. First is the unresolved question of what process tracing is in the first place—whether analytic narratives suffice or we should use evidentiary tests, whether it is rooted in formal logic or Bayesian logic, and the debates go on.Footnote 1 Second is the debate over what scholars need to do for their qualitative work to be deemed rigorous—and whether any approach can be trusted without submitting to the rigid transparency guidelines spearheaded by the Data Access and Research Transparency (DA-RT) initiative.Footnote 2 The ontological, epistemological, and ethical cleavages of these disputes run wide and deep. With scholars boycotting journals, and—effectively—journals boycotting scholars over transparency issues, the stakes of resolving these debates are high. As such, any approach that aims to resolve both issues within a unified framework deserves close attention. Proponents of Bayesian process tracing claim to do just that.
The Bayesian approach provides an analytic template for asking (and answering) a central research question: What is the probability that our main hypothesis
$(H_{M})$
is correct, given that we searched for and found evidence
$E_{i}$
? Using Bayes’ rule, researchers are encouraged to answer this question by explicitly specifying quantities such as the prior probability their hypothesis is correct given only their background information, and the likelihood of observing each piece of evidence in a world where their main hypothesis is true. Together, these quantities allow researchers to update their confidence in a given theory as they move through the evidence. Proponents argue that explicitly specifying probabilities for the quantities in Bayes’ rule not only makes process tracing more transparent,Footnote
3
but for some, it makes process tracing sufficiently transparent as to obviate some of the burdensome recommendations of the DA-RT initiative.Footnote
4
While the tradition of using Bayesian inference as a metaphorical frame for qualitative work dates back nearly forty years,Footnote 5 a recent wave of work in this vein has laid out a set of research practices and a corresponding set of strong claims about what explicit Bayesianism (i.e., specifying the probabilities in Bayes’ rule and updating mathematically) is capable of achieving.Footnote 6 Beyond asserting its foundational role in process tracing, four claims stand out: (1) the Bayesian approach enables causal inference from iterative research, (2) Bayesian logic makes the sequence in which we evaluate evidence irrelevant to inference, (3) the approach enables scholars to more fully engage rival explanations, and (4) Bayesian logic guards against ad hoc hypothesizing and confirmation bias.Footnote 7 If these assertions hold, the Bayesian approach will constitute a revolutionary advancement in process tracing.
Given the increasing quantity and impressive placement of work in this vein, Bayesian approaches represent a new frontier of qualitative research methods. Moreover, they have made a substantial impact in the discipline beyond the press. Since 2014, the Process Tracing modules at the Institute for Qualitative and Multi-Method Research (IQMR) and the American Political Science Association (APSA) Short Courses have focused primarily on the Bayesian approach, which means that for many students, this approach is the first and last word they receive in the way of qualitative training. Additionally, the memorandum put forth by the process-tracing subcommittee of the Qualitative Transparency Deliberations frames process tracing as though it is primarily a Bayesian method.Footnote 8 Finally, Bennett argues that the Bayesian approach can be adopted to policy analysis to improve projections and decision-making.Footnote 9 As yet, however, no one has conducted a systematic evaluation of the promises, trade-offs, and limitations of the Bayesian approach in practice. In light of both its growing footprint and ambitious claims, I take a step back to critically evaluate whether and to what extent the method lives up to the mission—and what happens when it comes up short.
This article proceeds as follows. I begin with an overview of Bayesian process tracing. The following four sections correspond to the claims outlined above. I lay out the stakes of each claim and evaluate the extent to which Bayesianism is the best tool for the job. I demonstrate that Bayesian tools add value to some areas of the research process, but they come with severe and often unacknowledged shortcomings that demand resolution before this approach can be widely adopted. I conclude by evaluating what I call the implicit fifth claim: that the analytic benefits of investing in the Bayesian approach outweigh the opportunity costs of dedicating methodological training time to other endeavors. Finally, I enumerate potential avenues for future development.
1 Overview of Qualitative Bayesian Research
This section gives an overview of Bayesian inference to contextualize the discussion. As Fairfield and Charman succinctly note, “Bayesian reasoning is simply a process of updating our views about which hypothesis best explains the...outcome of interest as we learn additional information.”Footnote
10
To anyone new to Bayesian reasoning, defining the process of continuous updating in light of evidence may seem so obvious as to not need a name beyond research. For proponents of Bayesian methods, the obviousness is a feature, not a bug, since a core goal of the Bayesian approach is to formalize the approach researchers use intuitively.Footnote
11
Given the central role of examining alternative hypotheses in addition to our own, Bennett as well as Fairfield and Charman advocate using the odds-ratio form of Bayes’ rule, which assesses the strength of one hypothesis
$(H_{M})$
relative to an alternative
$(H_{A})$
:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210426080719784-0097:S1047198720000108:S1047198720000108_eqn1.png?pub-status=live)
Beginning with the right-hand side of the equation, the numerator,
$P(H_{M}|I)$
, represents the prior probability on our main hypothesis: our degree of belief that
$H_{M}$
describes the world given only our background information,
$I$
. The denominator represents our prior on the alternative hypothesis that we are testing against the main.
The next term in the equation represents the likelihood ratio: the probability of observing a given piece of evidence in a world where
$H_{M}$
is true over the probability of observing the same piece of evidence in a world where
$H_{A}$
is true. The goal of the likelihood ratio is to formalize our assessment of how likely we are to find a given piece of evidence under competing hypotheses.
Finally, the prior and likelihood ratios are multiplied together to compute the posterior probability. The posterior gives us our updated confidence in each hypothesis given that we have observed evidence
$E_{i}$
. If, for example,
$E_{i}$
is more plausible in the world of
$H_{M}$
than the world of
$H_{A}$
(as assessed in the likelihood ratio), then our confidence in
$H_{M}$
relative to
$H_{A}$
will increase. Posterior probabilities (or ratios) are then used to inform the priors for analyzing subsequent pieces of evidence.
While some qualitative methodologists encourage scholars to use the Bayesian framework primarily as a metaphorical tool for updating their confidence in hypotheses in light of new evidence,Footnote 12 others—and the work that has made the largest footprint in the discipline—encourage an explicit mathematical implementation of Bayes’ rule.Footnote 13 In the latter approach, scholars are encouraged to conjure and justify numerical quantities for the prior and the likelihood to mathematically compute our updated confidence in one hypothesis relative to another.Footnote 14 Bennett neatly captures the consensus among qualitative Bayesians, arguing that “explicitly assigning priors and likelihood ratios and using Bayesian mathematics can make process tracing more rigorous and transparent.”Footnote 15 Scholars in this camp argue that while the probabilities themselves might be incorrect, the act of explicitly justifying them and subjecting both the probabilities and justifications to scrutiny (via peer review) contributes to greater transparency of the assumptions researchers make implicitly (even without a Bayesian approach).Footnote 16
Due in part to the strength of the claims in the explicit Bayesian literature, and in part to the impact this approach has made in the discipline—both in the press and in methodological training modules—the remainder of the article focuses primarily on the explicit (mathematical) Bayesian approach. Explicit Bayesian process tracing requires the most time dedicated to learning the method, yet, according to its proponents, the investment is justified because it is capable of delivering the greatest returns in terms of rigor, transparency, and the quality and scope of inferences.
2 Claim 1: Bayesianism Allows us to Revalue and Implement Iterative Research
Bayesians note that the research process often involves a messy “dialogue with the data,”Footnote 17 yet, the print version more closely resembles a monologue that begins in an armchair and ends with a light bulb. By formalizing guidelines for iteratively updating our confidence in hypotheses in light of additional evidence, the Bayesian approach may foster a more systematic and transparent account of how our research proceeds in practice. However, the attempt to move beyond the metaphor of Bayesian updating raises three questions and corresponding issues that must be resolved before the method is widely adopted.
First, the Bayesian process-tracing literature exhibits an unacknowledged debate about what “iterative research” refers to. While all qualitative Bayesian scholars agree that one aspect of iterative research is the process of updating our confidence in theories as we move through evidence, some also take the view that iterative research entails updating the theories as well.Footnote 18 For example, where Bennett argues that Bayesianism is not suited to theory generation,Footnote 19 Beach and Pedersen contend that “in most realistic research situations, theory-building and theory-testing are actually used in an iterative fashion.”Footnote 20 Thus, for some, Bayesian logic provides a template for the inductive side of research.
If one of the core goals of bringing explicit Bayesian computation to process tracing is to provide a formal template for the processes scholars tend to use implicitly, acknowledging that scholars often update theories as they analyze evidence works in service of this goal. However, if iteration does involve an inductive component, Bayesian methodologists must provide clear guidelines for devising and altering priors on hypotheses as the hypotheses themselves change. Additionally, the literature must instruct scholars on how to deal with evidence that has already been analyzed before the hypothesis was altered. Currently, however, the literature is silent on these crucial matters. Thus, for those who claim that Bayesian iteration works in service of theory refinement, there is more work to do before the method can deliver on this goal.
Second, current formulations of the procedure lack clear and consistent guidelines for implementing a systematic, iterative process in practice. The classical formulation of Bayes’ theorem (which tests a hypothesis against its logical negation) enables researchers to update the probability that a single hypothesis is true given a piece of evidence.Footnote 21 It provides a clear path forward for iteration by using the updated posterior as the new prior on that hypothesis for analyzing the next piece of evidence, and so forth. Yet, this form of Bayes’ theorem is problematic because of the assumptions underlying the equationFootnote 22 and because it does not adjudicate among rival hypotheses, which is a central goal of process tracing.Footnote 23
Fairfield and Charman offer a corrective via the odds-ratio form of Bayes’ rule (Equation (1)). While this equation allows for adjudication by returning an update on the relative odds of two competing hypotheses given a piece of evidence, the process of iteration becomes elusive. Instinctively, the next step should be to use the posterior odds as the updated prior to analyze the next piece of evidence,
$E_{2}$
, but in the example given, the authors instead revert to placing equal odds on the two hypotheses despite having previously found support for only one of them.Footnote
24
This choice calls into question the purpose of computing the posterior and the claim that Bayesian logic provides a formal template for iterative research. Moreover, the problem of how to conduct systematic iterative research becomes orders of magnitude more complex when researchers have more than two competing hypotheses.Footnote
25
If the Bayesian method does not establish clear guidelines for updating our confidence in hypotheses as we move through evidence, its utility as a tool for systematic iterative research becomes questionable.
Third, this recommendation raises questions of how narratives of iterative testing and refinement are expected to fit within the standard word limits of academic journals. Current disciplinary conventions already place space constraints on qualitative scholars; it would be harder still to fit in all the hypotheses we thought might explain the case first. Though some rightfully note that scholars can sidestep rigid word limits with online appendices or supplementary material,Footnote 26 this recommendation does not acknowledge the additional burden placed on qualitative scholars to produce “article-length” manuscripts that may effectively be one and a half or two articles worth of material for each publication. Of course, if Bayesian process tracing proves superior to other methods on dimensions of analytic rigor, inferential nuance, or transparency, this extra work may be deemed worthwhile, but the value added must be clear and measurable before enjoining scholars to take on that burden.
To be sure, qualitative Bayesian scholars are correct in noting that the pressure for flawless causal identification, proof of exogeneity, and preregistration of research designs belie the process by which a lot of research unfolds. Moreover, structuring analytic narratives around an iterative process has additional benefits beyond what Bayesian proponents acknowledge. Specifically, by omitting reference to how our theories developed over the course of a project, we lose a codified record of the ideas that seemed good on paper, but failed to pan out in testing. The discipline would benefit from embracing a model of research that not only accounts for the evolution of the research process, but also prevents other researchers from traversing the same (often well-trod) path to a dead end.
Across the qualitative Bayesian literature, scholars converge on the core benefit of this approach: Bayesian logic provides an explicit procedure for iterative updating as researchers move through evidence.Footnote 27 Although Bayesian updating provides a useful metaphorical tool for conducting iterative research, current formulations of the procedure lack the necessary guidelines for executing a systematic, iterative process in practice. Thus, in its current state, Bayesian process tracing needs to be refined further to ensure that this method is the best path for revaluing and implementing iterative research.
3 Claim 2: Timing of Evidence is Irrelevant to Inference
Qualitative Bayesians’ second core claim is that the order of evaluating evidence and “keeping track of what we knew when” is “logically irrelevant” to inference.Footnote 28 This assertion aims to break down what Bayesians view as the hard—yet artificial—line drawn between exploratory (hypothesis-generating) and confirmatory (hypothesis-testing) stages of research.Footnote 29 Based on the mathematical irrelevance of time in probability theory, Fairfield and Charman push further than others in the Bayesian camp and articulate three strong claims about the unimportance of timing across the research process, enumerated below. While not all qualitative Bayesians echo these arguments, the implications that follow affect the entire research process, and thus, are substantial enough to warrant special attention.
-
(1) “New evidence has no special status relative to old evidence,”Footnote 30
-
(2) “Learning the same pieces of information in different orders must produce identical results,”Footnote 31
-
(3) “There are few analytical benefits to reporting temporal details about how the research process unfolded.”Footnote 32
The stakes of these arguments are quite high. If, indeed, logical Bayesianism enables scholars to test an inductively inspired hypothesis with the evidence that inspired it, this method not only calls into question the utility of preregistration and time-stamping evidence, but moreover, it suggests a much wider scope of inferential validity than the discipline currently embraces. If, however, the method falls short in practice, it could lead scholars to draw hasty or biased conclusions based on insufficient evidence.
The claims about the irrelevance of timing have implications for every stage of the research process, which I examine in turn: (1) hypothesis generation, (2) hypothesis refinement, (3) hypothesis testing, and (4) how researchers report results. While some of the arguments raise important points, I demonstrate that the focus on timing’s mathematical irrelevance will cause those who fully subscribe to them to hastily disregard the cognitive, inferential, and practical relevance of timing considerations in conducting and documenting research. The corresponding problems cascade through all stages of research and have deleterious effects on inference.
3.1 Hypothesis Generation
Beginning with hypothesis generation, I demonstrate that for timing to be truly irrelevant, research cannot be costly (which it is) and the distribution of support for a given hypothesis must be uniform across any subset of evidence (which it is likely not). Although Bayesian process tracing does not have an explicit method for theory generation via “soaking and poking,”Footnote
33
the push to break down the line between inductive and deductive reasoning combined with the claim that new evidence is not uniquely valuable has clear implications for this aspect of research.Footnote
34
Say a researcher stumbles upon a new puzzle. At the outset, the number of hypotheses is effectively infinite. Given the temporal, financial, and cognitive costs of reading the full set of relevant evidence
$\mathbb{E}$
, she will begin by reading some subset of
$\mathbb{E}$
to get a sense of what accounts for the outcome. At a certain point, she has to start placing bets on some hypotheses over others based on what she has seen initially. For example, among
$E_{1-10}$
perhaps five pieces of evidence inspire
$H_{a}$
, another three inspire
$H_{b}$
, and two inspire
$H_{c}$
.
Thus, the first problem with asserting the irrelevance of timing arises as a function of costly search. The evidence researchers encounter initially will affect which hypotheses they devise, refine, and test. Scholars effectively face the multi-armed bandit problem in the context of hypothesis generation.Footnote 35 Given limited resources, they are forced to optimize between exploration (reading more material to devise additional hypotheses) and exploitation (settling on a few hypotheses based on initial material and following the corresponding leads). As such, “what we knew and when”Footnote 36 plays a role in research from the outset.
In the current example, the researcher will assign relatively high priors to
$H_{a}$
, slightly lower priors to
$H_{b}$
and even lower ones to
$H_{c}$
. In short, what scholars choose as their “training data” (i.e., the initial set of evidence examined—and thus, the order in which scholars observe evidence) will affect the probability of choosing a given hypothesis, the prior probability assigned to it, and the priors assigned to any hypothesis devised later (since ad hoc hypotheses should be penalized).Footnote
37
A different sampling of evidence may have altered the priors substantially, and while Bayesians argue that prior probability functions will converge in the long run,Footnote
38
Putnam reminds us that for our results to be valid, convergence must “be reasonably rapid,”Footnote
39
since “in the long run, we’ll all be dead.”Footnote
40
To embrace a method that relies on convergence across scholars, methodologists must examine when and whether convergence happens, and what goes wrong if it does not.Footnote
41
3.2 Hypothesis Refinement
One of the most touted benefits of the Bayesian approach is that it provides a systematic framework for not just updating confidence in hypotheses, but also refining the hypotheses themselves as researchers “move back and forth between theory development and data.”Footnote 42 However, valuing iterative refinement appears to directly contradict the global irrelevance of timing. Iteration, by definition, is about revising and updating our beliefs as we move through additional evidence over time. Yet, arguments that degrade “new” evidence—claiming it has no analytically distinct benefits over what researchers have already seen—undermine the value and process of iterative refinement, thereby creating logical inconsistencies in the method.
In the context of hypothesis refinement, disregarding the role of new evidence gives rise to what is known mathematically as a “stopping problem.” To illustrate, say we start with a hypothesis,
$H$
. After analyzing evidence
$E_{1}$
, we slightly modify our hypothesis to
$H^{\prime }$
. Then, we move on to evidence
$E_{2}$
and modify our hypothesis again to
$H^{\prime \prime }$
. If indeed we can use
$E_{1,2}$
to test
$H^{\prime \prime }$
, how do we know when to stop analyzing data? While
$E_{1}$
and
$E_{2}$
certainly support
$H^{\prime \prime }$
, it seems only sensical to continue testing with additional evidence to not only assess its accuracy, but to assess its stability as well. New evidence can establish whether we have arrived at a stable hypothesis, or whether further refinements are warranted.Footnote
43
If, alternately, the same piece of evidence can inspire, refine, and test, researchers are susceptible to reporting unrefined hypotheses by stopping their analyses too soon.Footnote
44
3.3 Hypothesis Testing
Turning to hypothesis testing, I evaluate how arguments 1 and 2 affect inference. In light of the claim that new evidence has no distinct value, it follows that analysts can test a hypothesis using the evidence that inspired it.Footnote 45 On the one hand, this assertion forces us to think more critically about the inferential value of “inspiring” evidence. On the other hand, it downplays and misrepresents value of “new” or additional evidence in Bayesian analysis. Ultimately, the attempt to break down the line between inductive and deductive stages of research is taken too far.
Providing a framework for incorporating inspirational evidence into our analyses is a positive contribution. Falling in line with Lieberman’s recommendation to move toward disciplinary norms that value (and publish) descriptive and inductive research,Footnote 46 the push to recognize the value of “old evidence” calls into question the value of time stamping and pre-registration. As Lieberman argues, “if we take the idea of pre-registration too far...we will surely crowd out the important inductive work upon which scientific discovery depends.”Footnote 47 For example, the best piece of evidence (in terms of its clarity, specificity, or just interestingness) might be the one that inspired the final version of a hypothesis or theory. While the researcher should likely conduct additional research to ensure the hypothesis’s stability, pretending that old evidence has no analytic value does little to improve our inferential validity.Footnote 48
However, this contribution is overshadowed by multiple problems. First, proponents of using inspirational evidence to evaluate hypotheses fail to distinguish between supporting a hypothesis (which inspirational evidence can do), and testing a hypothesis (which it cannot). To test something is, by definition, to expose it to strain—and inspirational evidence will not strain the hypothesis derived from it. Returning to the hypothesis-generation example, the scholar may treat
$E_{1-10}$
as sufficient to conclude that
$H_{a}$
outperforms
$H_{b}$
and
$H_{c}$
. While this conclusion may be true, the tactic of using old evidence to support and test it is only reliable if the distribution of support for all hypotheses in any subset of the evidence is equivalent to the distribution of support across the full set of evidence,
$\mathbb{E}$
. In reality, support for
$H_{a}$
may be systematically clustered in
$E_{1-10}$
.Footnote
49
In mathematical terms, using the same subset of evidence to inspire and test a hypothesis may lead researchers to choose a hypothesis that represents a local maximum, but report it as though it constitutes a global one.
Another problem is that these recommendations imply a false trade-off between revaluing old evidence and devaluing new evidence. Yet, this is not a zero-sum game. Contrary to the argument underlying this claim,Footnote 50 a distinct theory-testing stage is not solely a frequentist notion, and “new” evidence plays a core role in Bayesian inference. Even in the optimal research setting, examining additional evidence beyond the “training set” narrows the band of uncertainty around a researcher’s beliefs. In less ideal situations, examining additional evidence can alert the researcher to counterevidence or necessary modifications to their hypotheses. By downplaying the value of new evidence, those recommending this approach undermine the process and contribution of iterative research in the first place.
The second claim pertinent to hypothesis testing is that the results should be insensitive to the order in which the evidence is examined.Footnote 51 If this assertion holds in practice, then not only does the Bayesian approach obviate the timing considerations that underly the DA-RT recommendations, but it also constitutes a remarkable tool for inference by ostensibly shielding researchers to temporal biases like serial-position effects.Footnote 52
The outstanding problem with this claim (that the order of evidence does not matter) and the previous one (that new evidence has no unique benefits) is that a significant portion of Bayesianism’s added-value hinges on researchers’ abilities to overcome known cognitive limitations. Advocates of the method instruct us to put inspirational evidence out of our minds to derive priorsFootnote 53 and to ensure that the order in which we examine evidence does not affect our final probabilities.Footnote 54 While this advice is well intentioned, researchers should be wary of the method until Bayesian methodologists provide evidence that the recommendations can be reliably implemented. Psychological research suggests that the reality of human cognition is one in which order matters.Footnote 55 It is at best naïve—and, at worst, inciting of bias—to argue that the order in which we incorporate evidence should not matter because “the rules of conditional probability mandate [it].”Footnote 56 The question is not whether we should be able to disregard sequence; the question is whether we can. Thus, to the extent that the main contributions of the Bayesian approach require us to overcome these biases, advocates of the method must demonstrate that researchers are capable of arriving at both consistent and accurate conclusions irrespective of the sequence in which they see evidence.Footnote 57
3.4 Reporting Hypotheses and Results
Finally, I evaluate how claims about the irrelevance of timing affect the final product: that is, how researchers structure and report their findings. Breaking from the intuitive appeal of the Bayesian approach, Fairfield and Charman argue that “reporting temporal details about how the research process unfolded” affords “few analytical benefits.”Footnote 58 Instructing scholars to eschew reporting “what they learned and when” will likely create two problematic ambiguities in published research.
First, this recommendation needlessly reduces transparency and thwarts readers’ chances of catching reasoning errors. Proponents of the method argue that the utility of Bayesian process tracing is to formalize iterative research while subjecting the process of iteration and choices of probabilities to scrutiny via peer review.Footnote 59 Indeed, the crux of Bayesian transparency is that reviewers and future researchers can challenge and amend poorly-chosen or biased priors. Yet, if researchers succumb to serial-position effects or make other temporally based errors without accounting for what they learned and when they learned it, their readers will lack the information needed to assess whether the chosen probabilities were influenced more by timing than reality.
The second issue is one of practicality and readability. Even if the order in which a researcher analyzes her evidence proves entirely irrelevant to the probabilities she assigns and conclusions she draws, why encourage her to disregard the sequence in her narrative? If we do not structure our write-ups to broadly map onto the sequence of iterative updating, what is the alternative way of structuring them? While it is worth noting that the order in which researchers analyze evidence should not affect their conclusions, enjoining researchers not to report that order only further reduces the transparency of the process and leaves them without an alternative structure that improves on using analytic sequence as a narrative anchor.
4 Claim 3: The Bayesian Approach Fully Engages Rival Explanations
The third claim I address—and the motivation for using the odds-ratio form of Bayes’ rule—is that Bayesian process tracing enables scholars to more explicitly engage rival explanations.Footnote 60 Fully engaging alternative explanations is crucial for making valid inferences and is a hallmark of process tracing in all its forms.Footnote 61 While the recommendation to engage alternative hypotheses is well-received, the Bayesian method for inference relies on the often incorrect assumption that all rival hypotheses are mutually exclusive: that is, they cannot simultaneously be true.Footnote 62 Mutual exclusivity is a core modeling assumption in the Bayesian framework, and the question I ask here is, what goes wrong if it does not hold?
4.1 Treatment of Rival Hypotheses under Bayesianism
In current formulations of the method, Bayesian scholars propose two hypotheses, parenthetically note that they assume mutual exclusivity among them, and proceed with the analysis.Footnote 63 For example, when Fairfield and Charman apply the framework to Kurtz’s state-building research, the authors write: “We wish to ascertain whether the resource-curse hypothesis, or the welfare hypothesis (assumed mutually exclusive), better explains institutional development in Peru.”Footnote 64 While proponents of the method are consistent in noting this assumption, they do not instruct scholars on how to assess whether mutual exclusivity holds or what goes wrong when multiple hypotheses may work simultaneously (but are analyzed as though they do not). Despite an established typology of relationships among alternative explanations and a corresponding expansion to Bayes’ rule to accommodate nonexclusivity,Footnote 65 none of the proponents of Bayesian process tracing has incorporated these insights into the method.
4.2 Problems and Implications
The Bayesian approach is not equipped to handle the range of forms hypotheses and evidence tend to take. The problems are both substantive and mathematical, which together result in a method that is more limited than any of its proponents acknowledge. Substantively, the method encourages an oversimplification of the world by sidestepping the frequency with which two causal factors together bring about an outcome. Indeed, Bayesian methodologists frequently use examples that likely violate the mutual exclusivity assumption.Footnote 66 By pitting nonexclusive explanations against one another, the Bayesian approach forces hypotheses to take a strong form in which evidence supporting one hypothesis is interpreted to necessarily undermine any other. In reality, many hypotheses implicitly take the form “this matters, too,” but the literature is unclear on how to proceed in those instances, beyond acting as though they are exclusive anyway.
Mathematically, when two nonexclusive hypotheses are treated as though exclusivity holds, nearly every term in the equation is inaccurate. To motivate this discussion, imagine comparing two hypotheses that could jointly be true: whether greed (
$H_{$}$
) or grievance
motivates participation in rebellion (given by Equation (2)). Nothing about being upset with the government precludes greediness and rebels may easily exhibit both traits.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210426080719784-0097:S1047198720000108:S1047198720000108_eqn2.png?pub-status=live)
The first problem lies in the formulation of the prior: the probability that greed (
$H_{$}$
) explains participation over the probability that grievance
explains participation. Since greed and grievance can operate together, the prior disregards any extent to which the world is represented by the quantity
Footnote
67
If researchers proceed as though mutual exclusivity holds, they must first justify the decision theoretically (since “only greed” is a very different hypothesis from “greed affects participation”), and compute the numerator and denominator as
This approach raises questions about how the quantity
is assessed and how researchers can identify whether they are incorrect.
The mutual exclusivity assumption also plagues the likelihood ratio (the odds of observing a piece of evidence in world
$H_{$}$
versus world
Researchers must do extra legwork to justify that evidence of greed is evidence in favor of only greed (rather than greed and grievance together). This task may simply involve finding separate, but disconfirming evidence for the alternative hypothesis—though the method’s proponents do not mention this requirement. Without acknowledging and subtracting the possibility of being in the overlap space, estimates for the numerator are likely to be artificially inflated, and estimates for the denominator are likely to be artificially attenuated, thereby introducing confirmation bias.
The final problem lies in the denominator of the likelihood ratio and derives from a related, yet unacknowledged, assumption about the nature of evidence. Specifically, this quantity only returns valid results when every piece of evidence is relevant to all hypotheses. In the context of nonexclusivity, however, evidence that supports one hypothesis is not likely to affect the plausibility of the other. For example, an interview with a former rebel who expressed a desire to profit from illicit diamond mining (
$E_{\diamond }$
) does not tell us anything about grievance. If a piece of evidence is only relevant to greed
$(H_{$})$
, how can scholars derive or interpret
Mathematically,
$E_{\diamond }$
and
are independent events; consequently, the denominator reduces to
$P(E\diamond )$
(i.e., the probability of finding this piece of evidence anywhere).Footnote
68
Even if we have the ability to assess the overall probability of observing a piece of evidence unconditional on any hypothesis, the likelihood ratio is no longer comparing what it purports to compare.
The implications of this problem extend beyond a misused conditional probability. If the evidence supports one hypothesis but is unrelated to the other, this equation introduces disconfirmation bias—in which neutral evidence acts as an undue penalty against the unrelated theory. For example, since
$E_{\diamond }$
supports
$H_{$}$
, then the probability of observing
$E_{\diamond }$
conditional on
$H_{$}$
is necessarily greater than
$P(E_{\diamond })$
overall. Consequently, evidence that is substantively uninformative about grievances will cause researchers to conclude not only that greed is the more plausible explanation (which is fine because we found evidence supporting it), but also that grievance is less plausible despite not having found evidence against it. In reality, both hypotheses still may be true.
All analytic techniques have assumptions, limitations, and trade-offs. The responsibility falls to the methodologist to make known the technique’s limits and the consequences of proceeding beyond them. These guidelines for appropriate use are absent in the current literature. The infrequency with which mutual exclusivity holds in practice belies qualitative Bayesians’ call for wide adoption of the method. Moreover, the breadth of traction qualitative Bayesian analysis has gotten in the discipline in the absence of solutions for dealing with nonexclusive hypotheses puts scholars at risk for adopting a method that is inappropriate for the majority of our questions.
5 Claim 4: The Bayesian Approach Counteracts Biases
The final claim I evaluate is that Bayesian reasoning helps prevent two cognitive biases plaguing qualitative research: confirmation bias and ad hoc hypothesizing.Footnote 69 Guarding against confirmation bias and ad hoc hypothesizing is crucial to ensure a theory’s validity beyond evidence examined. The question I raise here is whether the Bayesian approach is well-suited to being the shield.
5.1 Reducing Confirmation Bias
According to its proponents, Bayesian process tracing reduces confirmation bias in two ways: using likelihood ratios to ensure researchers attend to competing hypotheses and using conditional probabilities.Footnote 70 Likelihood ratios are supposed to reduce confirmation bias by “precluding the pitfall of restricting attention to a single hypothesis.”Footnote 71 As I demonstrate in the previous section, however, their utility is limited to testing mutually exclusive hypotheses. In short, while good inference may “always involve comparing hypotheses,”Footnote 72 it does not always involve competing hypotheses. Without the latter, Bayesian analysis will induce, rather than correct, confirmation bias.
The other mechanism aimed at reducing confirmation bias is the prescription to condition probabilities on “all relevant information available without presuming anything beyond what is known or bringing mere opinions or desires into evaluation.”Footnote 73 Notwithstanding the intuitive framework for thinking about the impact of evidence on our hypothesis, the prescription to condition probabilities only on what we know objectively is not sufficient to ensure proper execution. In reality, this prescription is no different than telling people not to cherry pick evidence or allow fondness for our pet hypothesis to color our evaluation of it. Beyond the argument that using explicit probabilities forces us to justify our choices and allows other scholars to evaluate them in the review process, the approach lacks any mechanism to enforce the recommendations.Footnote 74
5.2 Correcting for Ad hoc Hypothesizing
In a research context where evidence can both inspire and test hypotheses, preventing the tendency to “over-tailor an explanation to fit a particular...set of observations” is a critical and valuable safeguard.Footnote 75 The logical Bayesian corrective to ad hoc hypothesizing is to “penalize [the prior on] complex hypotheses if they do not provide enough additional explanatory power relative to simpler rivals.”Footnote 76 Initially, this approach seems reasonable: allow a high likelihood for evidence that inspired it, but assign low prior, since we devised the hypothesis post hoc. Upon further scrutiny, however, this recommendation exposes fundamental problems that compromise both its implementation and effectiveness.
The first problem is that the recommendation conflates “ad hoc” and “complex” without justifying the equivalence or providing guidelines to assess the severity of the infraction. Literally meaning “to this,” ad hoc hypotheses are constructed to fit particular pieces of evidence. They become problematic when they are tailored to idiosyncratic observations, thereby lacking generalizability. While an ad hoc hypothesis may be “arbitrary or overly complex,” focusing exclusively on the latter sidesteps the corrective most needed: testing the ad hoc hypothesis with new evidence to assess whether it holds up to further scrutiny. However, since neither arbitrariness nor excessive complexity are desirable traits for a theory, I engage this recommendation on its own terms and evaluate how Bayesian logic addresses these problems.
While the Bayesian recommendation to penalize the priors on the complex hypothesis has intuitive appeal, the literature lacks concrete guidelines for three corresponding tasks: how to evaluate relative complexity, how to scale penalties accordingly, and how to assess trade-offs. Fairfield and Charman define a complex hypothesis as one that “invokes many more causal factors or elaborate conjunctions of causal factors.”Footnote 77 But, is a hypothesis with four causal factors so much more complex than a hypothesis with three that it deserves a penalty and how big of one? How do scholars adjudicate between a simple, yet ad hoc hypothesis, and a complex, but more general alternative devised a priori? For this method to be as useful of a practical tool as is it a metaphorical tool, researchers need a concrete guide to answer these questions.
The value placed on simplicity forces us to ask just how simple our explanations have to be to pass muster, and what simplicity buys us in terms of explanatory power. Take, for example, the democratic peace proposition, which Bruce Russet called “one of the strongest nontrivial and nontautological generalizations that can be made about international relations.”Footnote 78 On the surface, the hypothesis is simple and elegant: democracies do not go to war with one another. But under the hood, the moving parts are countless: what it takes to be a democracy, what aspects of democracy are salient, how normative and institutional mediators matter for preventing conflict, and so on. The truth of the matter is that Occam’s Razor is infrequently well-suited to social phenomena—and until someone demonstrates that the simpler explanation tends to be the right one where politics is concerned, penalizing marginal complexity is unsubstantiated.
Returning to the main prescription—to penalize priors on ad hoc hypotheses—the second problem comes to light when one tries to assess whether an ad hoc hypothesis “provides enough additional explanatory power relative to a simpler rival.”Footnote 79 This recommendation lacks pragmatic guidelines for evaluating what constitutes enough explanatory power and it highlights an inherent contradiction in the method. Since ad hoc hypotheses are problematic because they are tailored to fit singular observations, then—contrary to the claim that new evidence has no special status over old evidence—“new” evidence is uniquely valuable for evaluating the scope of the hypothesis’s explanatory power.Footnote 80 Without new evidence, the Bayesian approach is unable to resolve the fundamental problem with ad hoc hypotheses.
The final problem is that the penalties on ad hoc hypotheses are assigned “relative to simpler rivals.”Footnote 81 Per the previous section, political phenomena are often (if not always) driven by multiple causal factors.Footnote 82 As such, by pitting hypotheses against one another by default—especially without justifying the superiority of simpler explanations—the Bayesian approach runs the risk of pushing scholars away from complete and accurate descriptions of the phenomena they are investigating.
6 Discussion
I conclude with a discussion of the implicit 5th Claim of Bayesian process tracing: that the technique’s benefits justify the costs of adoption. To be sure, this claim is inherent to all methodological advancements, and it warrants careful examination given that researchers face trade-offs between breadth and mastery in their training. I evaluate the utility and opportunity costs of Bayesian process tracing (or any methodology) on three dimensions: (1) the extent to which the method delivers on its claims, (2) the extent to which it constitutes an improvement over existing techniques, and (3) the extent to which it minimizes opportunity costs to researchers.
First, to justify adoption at the scale to which qualitative Bayesian methodologists aspire, Bayesian process tracing must live up to its central claims. In their current state, however, the techniques are insufficiently developed to achieve most of the stated goals. While the narrative (as opposed to the explicit mathematical) form of Bayesian analysis provides an intuitive framework for updating our beliefs as we move through evidence, the method (particularly the mathematical version) exhibits major shortcomings vis-à-vis its other objectives. Although it appears to provide a rigorous template for iterative research, methodologists implementing the technique exhibit contradictory and counterintuitive practices when it comes to updating priors as they examine additional evidence (as demonstrated in Section 1, above). Similarly, while the claim that “timing is irrelevant” highlights the importance of taking inspirational evidence into account, taking the claim as far as its proponents advocate introduces more problems than it corrects for. Next, per Section 3 above, Bayesian techniques remain incapable of dealing with nonexclusive “rival” hypotheses, despite claiming to offer more critical adjudication of alternatives. Finally, the problems associated with incomplete treatment of timing and rival hypotheses introduce, rather than correct, cognitive biases. Taken together, these shortcomings are likely to compromise the analytic transparency that the method is purported to contribute. Thus, while the goals of Bayesian process tracing are important, much work remains to be done before the method lives up to the mission.
The second requirement to justify adopting a new method is that it should improve upon existing techniques.Footnote 83 Qualitative Bayesian methodologists have gone to lengths to replicate seminal studies—demonstrating that we can reach the same conclusions within a Bayesian framework.Footnote 84 Yet, the value added remains elusive.Footnote 85 The explicit Bayesian approach, in particular, requires marked start-up costs in the way of training. While extensive methodological training is not inherently problematic, we should only embrace it with the promise that at the end of the tunnel we will be able to improve—rather than just replicate—what came before.
However, the studies consistently referenced and upheld as the most exemplary use only the most basic form of process tracing—the analytic narrative—and none of those who replicate them demonstrates where the Bayesian approach improves the conclusions. Indeed, two of the three cited do not even refer explicitly to “process tracing” in their books. The fact of the matter is that Schultz and Wood and Tannenwald are excellent data collectors, analyzers, and writers—skills that consistently prove to be the most central assets to good (and transparent) process tracing. Until Bayesian proponents can demonstrate where their method reveals new conclusions or more nuanced inferences, the costs of adoption will continue to outweigh the benefits.
The third and related requirement to justify wide adoption is that Bayesian methodologists must demonstrate not only the analytic purchase of the technique, but moreover, they must justify the opportunity costs of investing in it. As I mention above, the start-up costs are significant: one must learn the basics of process tracing, how to use Bayesian math, and—according to Fairfield and Charman—they should develop an intuition for using logarithmic scales to think about how to assign and scale probabilities.Footnote 86 While many analytic techniques demand researchers’ time to learn both the tools and intuition behind them, we must ask whether the quality of the research that results would be worse, equal, or better than the research that would result if researchers had other training in its place.
In the context of qualitative research, scholars have a lot more access to training in the analysis of data than they do in the research processes that get them the data in the first place. But the process of research and the processes we are researching are inextricable. Researchers would likely yield greater benefits from intensive training in ethnographic, interview, and sampling techniques; understanding the politics and biases associated with archival work; or even just additional and specialized language training needed to conduct research on a specific topic.Footnote 87 As I mention above, the qualitative work upheld as exemplary is consistently that which exhibits great skill in gathering evidence, rather than cutting-edge techniques in analyzing it. The vast array of critical skills to which researchers could dedicate their finite time and resources raises additional questions about whether the investment in learning Bayesian process tracing is worth the costs.
6.1 The Future of Bayesian Process Tracing
I conclude with a discussion of future directions of Bayesian process tracing. This article has demonstrated that, in their current state, qualitative Bayesian techniques are insufficient to achieve most of the method’s goals. Beyond providing better justification of the opportunity costs of training, four areas demand further research and refinement before the method can be considered viable.
First, qualitative Bayesian methodologists must provide more extensive and concrete guidelines for how to proceed with iterative updating. The primary motivation of adopting an explicit Bayesian frame for process tracing is that it provides a systematic method of updating confidence in hypotheses as researchers move through evidence. In its current state, however, the literature both lacks guidelines on key questions (e.g., how to proceed with analyses when testing more than two hypotheses) and exhibits contradictory practices (e.g., not using updated probabilities when analyzing subsequent pieces of evidence). As a result, researchers looking to put this method into practice lack a comprehensive and logically consistent template for executing the core task of Bayesian analysis.
The second point to reconcile is the contradictory role of timing. The assertion that “old evidence has no special status relative to new evidence” is effectively the chorus of Fairfield and Charman’s most recent article.Footnote 88 Unfortunately, the authors go so far in dismissing any value of timing that they create contradictions and missed opportunities. Advocates of the method would do well to walk this dismissal back and critically examine the utility of new evidence for assessing the stability of evolving theories and overcoming problems with ad hoc hypotheses. Furthermore, the value of the Bayesian approach would be more convincing to the extent that its proponents could demonstrate that researchers are generally capable of not making timing errors, and—when these errors do occur—that reviewers are generally capable of catching them.
The third area demanding further research is the process of assigning probabilities in explicit Bayesian analyses. If indeed Bayesianism guards against cognitive biases, advocates of the method have the responsibility to test whether and to what extent biases arise in practice when conjuring and evaluating quantities like the prior and likelihood functions. The method’s utility would be considerably more convincing if proponents could show (perhaps experimentally) that researchers with similar background information arrive at similar probability assessments and similar evaluations of others’ assessments. In other words, Bayesian methodologists should test whether the rules and recommendations they enjoin researchers to follow in theory are realistic in practice.
Fourth and most pressing, Bayesian methodologists must address the method’s limited capacity to deal with alternative hypotheses. Merely stating that one “assumes mutual exclusivity” without explicitly evaluating how hypotheses relate to one another downplays the severity of this modeling assumption and exaggerates the scope of research questions to which this approach is applicable.Footnote 89 The bias resulting from inappropriately assuming mutual exclusivity compromises not only the validity of inferences that follow from a Bayesian analysis, but also the broader claims about its capacity to increase transparency and reduce cognitive biases in qualitative research. Qualitative Bayesian methodologists must decide whether the method is best limited to testing mutually exclusive hypotheses, in which case they must instruct scholars how to assess whether mutual exclusivity holds in a given case. Alternatively, they must adopt (or derive new) expansions to Bayes’ rule to accommodate the wide scope of relationships among rival hypotheses we encounter.Footnote 90
To conclude, the principles motivating qualitative Bayesian process tracing have undeniable value. As a discipline, we should value iterative research, acknowledge the inferential contribution of “old” evidence, fully engage alternative explanations, and guard against confirmation bias and ad hoc hypotheses. Moreover, in the context of ongoing transparency debates, Bayesian scholars have raised important questions about the utility of time-stamping evidence and pre-registering research designs—yet they have not sufficiently established that the Bayesian approach is the best way forward. Ultimately, I argue that while the Bayesian approach is a useful metaphor for conducting transparent, iterative research, the specific techniques proposed as the way forward are more cloudy than its proponents currently acknowledge. As such, before we continue the multiyear legacy of training researchers in Bayesian process tracing, it is time the method takes two steps back before it takes another step forward.
Acknowledgements
Sherry Zaks is an assistant professor of political science at the University of Southern California. Her core research interests include rebel-to-party transformation, the organizational sociology of militant groups, research design, and process-tracing methods. Many thanks to Melissa Carlson, Hilary Matfess, Evan Ramzipoor, my reviewers, and, especially, Kenneth Schultz for their incisive comments on earlier drafts. The Center for International Security and Cooperation (CISAC) at Stanford University provided financial support for this research.
Supplementary material
For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2020.10.