1. Introduction
Diagnosing whether a patient has a disease, predicting whether a defendant is guilty of a crime, and other everyday as well as life-changing decisions reflect, in part, the decision-maker's subjective degree of belief in uncertain events. Intuitions about probability frequently deviate dramatically from the dictates of probability theory (e.g., Gilovich et al. Reference Gilovich, Griffin and Kahneman2002). One form of deviation is notorious: people's tendency to neglect base-rates in favor of specific case data. A number of theorists (e.g., Brase Reference Brase2002a; Cosmides & Tooby Reference Cosmides and Tooby1996; Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995) have argued that such neglect reveals little more than experimenters' failure to ask about uncertainty in a form that naïve respondents can understand – specifically, in the form of a question about natural frequencies. The brunt of our argument in this target article is that this perspective is far too narrow. After surveying the theoretical perspectives on the issue, we show that both data and conceptual considerations demand that judgment be understood in terms of dual processing systems: one that is responsible for systematic error and another that is capable of reasoning not just about natural frequencies, but about relations among any kind of set representation.
Base-rate neglect has been extensively studied in the context of Bayes' theorem, which provides a normative standard for updating the probability of a hypothesis in light of new evidence. Research has evaluated the extent to which intuitive probability judgment conforms to the theorem by employing a Bayesian inference task in which the respondent is presented a word problem and has to infer the probability of a hypothesis (e.g., the presence versus absence of breast cancer) on the basis of an observation (e.g., a positive mammography). Consider the following Bayesian inference problem presented by Gigerenzer and Hoffrage (Reference Gigerenzer and Hoffrage1995; adapted from Eddy Reference Eddy, Kahneman, Slovic and Tversky1982):
The probability of breast cancer is 1% for a woman at age forty who participates in routine screening [base-rate]. If a woman has breast cancer, the probability is 80% that she will get a positive mammography [hit-rate]. If a woman does not have breast cancer, the probability is 9.6% that she will also get a positive mammography [false-alarm rate]. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer? _%. (Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995, p. 685)
According to Bayes' theorem,Footnote 1 the probability that the patient has breast cancer given that she has a positive mammography is 7.8%. Evidence that people's judgments on this problem accord with Bayes' theorem would be consistent with the claim that the mind embodies a calculus of probability, whereas the lack of such a correspondence would demonstrate that people's judgments can be at variance with sound probabilistic principles and, as a consequence, that people can be led to make incoherent decisions (Ramsey Reference Ramsey, Kyburg and Smokler1964; Savage Reference Savage1954). Thus, the extent to which intuitive probability judgment conforms to the normative prescriptions of Bayes' theorem has implications for the nature of human judgment (for a review of the theoretical debate on human rationality, see Stanovich Reference Stanovich1999). In the case of Eddy's study, fewer than 5% of the respondents generated the Bayesian solution.
Early studies evaluating Bayesian inference under single-event probabilities also showed systematic deviations from Bayes' theorem. Hammerton (Reference Hammerton1973), for example, found that only 10% of the physicians tested generated the Bayesian solution, with the median response approximating the hit-rate of the test. Similarly, Casscells et al. (Reference Casscells, Schoenberger and Graboys1978) and Eddy (Reference Eddy, Kahneman, Slovic and Tversky1982) found that a low proportion of respondents generated the Bayesian solution: 18% in the former and 5% in the latter, with the modal response in each study corresponding to the hit-rate of the test. All of this suggests that the mind does not normally reason in a way consistent with the laws of probability theory.
1.1. Base-rate facilitation
However, this conclusion has not been drawn universally. Eddy's (Reference Eddy, Kahneman, Slovic and Tversky1982) problem concerned a single event, the probability that a particular woman has breast cancer. In some problems, when probabilities that refer to the chances of a single event occurring (e.g., 1%) are reformulated and presented in terms of natural frequency formats (e.g., 10 out of 1,000), people more often draw probability estimates that conform to Bayes' theorem. Consider the following mammography problem presented in a natural frequency format by Gigerenzer and Hoffrage (Reference Gigerenzer and Hoffrage1995):
10 out of every 1,000 women at age forty who participate in routine screening have breast cancer [base-rate]. 8 out of every 10 women with breast cancer will get a positive mammography [hit-rate]. 95 out of every 990 women without breast cancer will also get a positive mammography [false-alarm rate]. Here is a new representative sample of women at age forty who got a positive mammography in routine screening. How many of these women do you expect to actually have breast cancer? __ out of __ . (Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995, p. 688)
The proportion of responses conforming to Bayes' theorem increased by a factor of about three in this case, 46% under natural frequency formats versus 16% under a single-event probability format. The observed facilitation has motivated researchers to argue that coherent probability judgment depends on representing events in the form of natural frequencies (e.g., Brase Reference Brase2002a; Cosmides & Tooby Reference Cosmides and Tooby1996; Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995).
Cosmides and Tooby (Reference Cosmides and Tooby1996) also conducted a series of experiments that employed Bayesian inference problems that had previously elicited judgmental errors under single-event probability formats. In Experiment 1, they replicated Casscells et al. (Reference Casscells, Schoenberger and Graboys1978), demonstrating that only 12% of their respondents produced the Bayesian answer when presented with single-event probabilities. Cosmides and Tooby then transformed the single-event probabilities into natural frequencies, resulting in a remarkably high proportion of Bayesian responses: 72% of respondents generated the Bayesian solution, supporting the authors' conclusion that Bayesian inference depends on the use of natural frequencies.
Gigerenzer (Reference Gigerenzer1996) explored whether physicians, who frequently assess and diagnose medical illness, would demonstrate the same pattern of judgments as that of clinically untrained college undergraduates. Consistent with the judgments drawn by college students (e.g., Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995), Gigerenzer found that the sample of 48 physicians tested generated the Bayesian solution in only 10% of the cases under single-event probability formats, whereas 46% did so with natural frequency formats. Physicians spent about 25% more time on the single-event probability problems, which suggests that they found these problems more difficult to solve than problems presented in a natural frequency format. Thus, the physician's judgments were consistent with those of non-physicians, suggesting that formal training in medical diagnosis does not lead to more accurate Bayesian reasoning and that natural frequencies facilitate probabilistic inference across populations.
Further studies have demonstrated that the facilitory effect of natural frequencies on Bayesian inference observed in the laboratory has the potential for improving the predictive accuracy of professionals in important real-world settings. Gigerenzer and his colleagues have shown, for example, that natural frequencies facilitate Bayesian inference in AIDS counseling (Gigerenzer et al. Reference Gigerenzer, Hoffrage and Ebert1998), in the assessment of statistical information by judges (Lindsey et al. Reference Lindsey, Hertwig and Gigerenzer2003), and in teaching Bayesian reasoning to college undergraduates (Kuzenhauser & Hoffrage Reference Hoffrage, Gigerenzer, Krauss and Martignon2002; Sedlmeier & Gigerenzer Reference Sedlmeier and Gigerenzer2001). In summary, the reviewed findings demonstrate facilitation in Bayesian inference when single-event probabilities are translated into natural frequencies, consistent with the view that coherent probability judgment depends on natural frequency representations.
1.2. Theoretical accounts
Explanations of facilitation in Bayesian inference can be grouped into five types that can be arrayed along a continuum of cognitive control, from accounts that ascribe facilitation to processes that have little to do with strategic cognitive processing to those that appeal to general-purpose reasoning procedures. The five accounts we discuss can be contrasted at the coarsest level on five dimensions (see Table 1). We do not claim that theorists have consistently made these distinctions in the past, only that these distinctions are in fact appropriate ones.
Table 1. Prerequisites for reduction of base-rate neglect according to 5 theoretical frameworks
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160716045214-46937-mediumThumb-S0140525X07001653_tab1.jpg?pub-status=live)
Note. The prerequisites of each theory are indicated by an ‘X’.
A parallel taxonomy for theories of categorization can be found in Sloman et al. (in press). We briefly introduce the theoretical frameworks here. The discussion of each will be elaborated as required to reveal assumptions and derive predictions in the following sections in order to compare and contrast them.
1.2.1. Mind as Swiss army knife
Several theorists have argued that the human mind consists of a number of specialized modules (Cosmides & Tooby Reference Cosmides and Tooby1996; Gigerenzer & Selten Reference Gigerenzer and Selten2001). Each module is assumed to be unavailable to conscious awareness or deliberate control (i.e., cognitively impenetrable), and also assumed to be able to process only a specific type of information (i.e., informationally encapsulated; see Fodor Reference Fodor1983). One module in particular is designed to process natural frequencies. This module is thought to have evolved because natural frequency information is what was available to our ancestors in the environment of evolutionary adaptiveness. In this view, facilitation occurs because natural frequency data are processed by a computationally effective processing module.
Two arguments have been advanced in support of the ecological validity of natural frequency data. First, as natural frequency information is acquired, it can be “easily, immediately, and usefully incorporated with past frequency information via the use of natural sampling, which is the method of counting occurrences of events as they are encountered and storing the resulting knowledge base for possible use later” (Brase Reference Brase2002b, p. 384). Second, information stored in a natural frequency format preserves the sample size of the reference class (e.g., 10 out of 1,000 women have breast cancer), and are arranged into subset relations (e.g., of the 10 women that have breast cancer, 8 are positively diagnosed) that indicate how many cases of the total sample there are in each subcategory (i.e., the base-rate, the hit-rate, and false-alarm rate). Because natural frequency formats entail the sample and effect sizes, posterior probabilities consistent with Bayes' theorem can be calculated without explicitly incorporating base-rates, thereby allowing simple calculationsFootnote 2 (Kleiter Reference Kleiter, Fischer and Laming1994). Thus, proponents of this view argue that the mind has evolved to process natural frequency formats over single-event probabilities, and that, in particular, it includes a cognitive module that “maps frequentist representations of prior probabilities and likelihoods onto a frequentist representation of a posterior probability in a way that satisfies the constraints of Bayes' theorem” (Cosmides & Tooby Reference Cosmides and Tooby1996, p. 60).
Theorists who take this position uniformly motivate their hypothesis via a process of natural selection. However, the cognitive and evolutionary claims are in fact conceptually independent. The mind could consist of cognitively impenetrable and informationally encapsulated modules whether or not any or all of those modules evolved for the specific reasons offered.
1.2.2. Natural frequency algorithm
A weaker claim is that the mind includes a specific algorithm for effectively processing natural frequency information (Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995). Unlike the mind-as-Swiss-army-knife view, this hypothesis makes no general claim about the architecture of mind. Despite their difference in scope, however, these two theories adopt the same computational and evolutionary commitments.
Consistent with the mind-as-Swiss-army-knife view, the algorithm approach proposes that coherent probability judgment derives from a simplified form of Bayes' theorem. The proposed algorithm computes the number of cases where the hypothesis and observation co-occur, N(H and D), out of the total number of cases where the observation occurs, N(H and D)+N(not-H and D)=N(D) (Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995; Kleiter Reference Kleiter, Fischer and Laming1994). Because this form of Bayes' theorem expresses a simple ratio of frequencies, we refer to it as “the Ratio.”
Following the mind-as-Swiss-army knife view, proponents of this approach have ascribed the origin of the Bayesian ratio to evolution. Gigerenzer and Hoffrage (Reference Gigerenzer and Hoffrage1995, p. 686), for example, state
The evolutionary argument that cognitive algorithms were designed for frequency information, acquired through natural sampling, has implications for the computations an organism needs to perform when making Bayesian inferences …. Bayesian algorithms are computationally simpler when information is encoded in a frequency format rather than a standard probability format.
As a consequence, the algorithm view predicts that “Performance on frequentist problems will satisfy some of the constraints that a calculus of probability specifies, such as Bayes' rule. This would occur because some inductive reasoning mechanisms in our cognitive architecture embody aspects of a calculus of probability” (Cosmides & Tooby Reference Cosmides and Tooby1996, p. 17).
The proposed algorithm is necessarily informationally encapsulated, as it operates on a specific information format – natural frequencies; but it is not necessarily cognitively impenetrable, as no one has claimed that other cognitive processes cannot affect or cannot use the algorithm's computations. The primary motivation for the existence of this algorithm has been computational (Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995; Kleiter Reference Kleiter, Fischer and Laming1994). As reviewed above, the value of natural frequencies is that these formats entail the sample and effect sizes and, as a consequence, simplify the calculation of Bayes' theorem: Probability judgments are coherent with Bayesian prescriptions even without explicit consideration of base-rates.
1.2.3. Natural frequency heuristic
A claim which puts facilitation under more cognitive control is that people use heuristics to make judgments (Gigerenzer & Selten Reference Gigerenzer and Selten2001; Tversky & Kahneman Reference Tversky and Kahneman1974) and that the Ratio is one such heuristic (Gigerenzer et al. Reference Gigerenzer and Todd1999). According to this view, “heuristics can perform as well, or better, than algorithms that involve complex computations …. The astonishingly high accuracy of these heuristics indicates their ecological rationality; fast and frugal heuristics exploit the statistical structure of the environment, and they are adapted to this structure” (Gigerenzer Reference Gigerenzer2006). Advocates of this approach motivate the proposed heuristic by pointing to the ecological validity of natural frequency formats, as Gigerenzer further states (p. 52):
To evaluate the performance of the human mind, one needs to look at its environment and, in particular, the external representation of the information. For most of the time during which the human mind evolved, information was encountered in the form of natural frequencies …
Thus, this view proposes that the mind evolved to process natural frequencies and that this evolutionary adaptation gave rise to the proposed heuristic that computes the Bayesian Ratio from natural frequencies.
1.2.4. Non-evolutionary natural frequency heuristic
Evolutionary arguments about the ecological validity of natural frequency representations provide part of the motivation for the preceding theories. In particular, proponents of the theories argue that throughout the course of human evolution natural frequencies were acquired via natural sampling (i.e., encoding event frequencies as they are encountered, and storing them in the appropriate reference class).
In contrast, the non-evolutionary natural frequency theory proposes that natural sampling is not necessarily an evolved procedure for encoding statistical regularities in the environment, but rather, a useful sampling method that, one way or another, people can appreciate and use. The natural frequency representations that result from natural sampling, on this view, simplify the calculation of Bayes' theorem and, as a consequence, facilitate Bayesian inference (Kleiter Reference Kleiter, Fischer and Laming1994). Thus, this non-evolutionary view differs from the preceding accounts by resting on a purely computational argument that is independent of any commitments as to which cognitive processes have been selected for by evolution.
This theory proposes that the computational simplicity afforded by natural frequencies gives rise to a heuristic that computes the Bayesian Ratio from natural frequencies. The proposed heuristic implies a higher degree of cognitive control than the preceding modular algorithms.
1.2.5. Nested sets and dual processes
The most extreme departure from the modular view claims that facilitation is a product of general-purpose reasoning processes (Evans et al. Reference Evans, Handley, Perham, Over and Thompson2000; Fox & Levav Reference Fox and Levav2004; Girotto & Gonzales Reference Girotto and Gonzalez2001; Johnson-Laird et al. Reference Johnson-Laird, Legrenzi, Girotto, Legrenzi and Caverni1999; Kahneman & Frederick Reference Kahneman, Frederick, Gilovich, Griffin and Kahneman2002; Reference Kahneman, Frederick, Holyoak and Morris2005; Over Reference Over and Over2003; Reyna Reference Reyna1991; Sloman et al. Reference Sloman, Over, Slovak and Stibel2003). In this view, people use two systems to reason (Evans & Over Reference Evans and Over1996; Kahneman & Frederick Reference Kahneman, Frederick, Gilovich, Griffin and Kahneman2002; Reference Kahneman, Frederick, Holyoak and Morris2005; Reyna & Brainerd Reference Reyna, Brainerd, Wright and Ayton1994; Sloman Reference Sloman1996a; Stanovich & West Reference Stanovich2000), often called Systems 1 and 2. But in an effort to use more expressive labels, we will employ Sloman's terms “associative” and “rule-based.”
The dual-process model attributes responses based on associative principles like similarity or retrieval from memory to a primitive associative judgment system. It attributes responses based on more deliberative processing that involves working memory, such as the elementary set operations that respect the logic of set inclusion and facilitate Bayesian inference, to a second rule-based system. Judgmental errors produced by cognitive heuristics are generated by associative processes, whereas the induction of a representation of category instances that makes nested set relations transparent also induces use of rules about elementary set operations – operations of the sort perhaps described by Fox and Levav (Reference Fox and Levav2004) or Johnson-Laird et al. (Reference Johnson-Laird, Legrenzi, Girotto, Legrenzi and Caverni1999).
According to this theory, base-rate neglect results from associative responding and facilitation occurs when people correctly use rules to make the inference. Rule-based inference is more cognitively demanding than associative inference, and is therefore more likely to occur when participants have more time, more incentives, or more external aids to make a judgment and are under fewer other demands at the moment of judgment. It is also more likely for people who have greater skill in employing the relevant rules. This last prediction is supported by Stanovich and West (Reference Stanovich2000) who find correlations between intelligence and use of base rates.
Rules are effective devices for solving a problem to the extent that the problem is represented in a way compatible with the rules. For example, long division is an effective method for solving division problems, but only if numbers are represented using Arabic numerals; division with Roman numerals requires different rules. By analogy, this view proposes that natural frequencies facilitate use of base-rates because the rules people have access to and are able to use to solve the specific kind of problem studied in the base-rate neglect literature are more compatible with natural frequency formats than single-event probability formats.
Specifically, people are adept at using rules consisting of simple elementary set operations. But these operations are only applicable when problems are represented in terms of sets, as opposed to single events (Reyna Reference Reyna1991; Reyna & Brainerd Reference Reyna and Brainerd1995). According to this view, facilitation in Bayesian inference occurs under natural frequencies because these formats are an effective cue to the representation of the set structure underlying a Bayesian inference problem. This is the nested sets hypothesis of Tversky and Kahneman (Reference Tversky and Kahneman1983). In this framework, natural frequency formats prompt the respondent to adopt an outside view by inducing a representation of category instances (e.g., 10 out of 1,000 women have breast cancer) that reveals the set structure of the problem and makes the nested set relations transparent for problem solving.Footnote 3 We refer to this hypothesis as the nested sets theory (Ayton & Wright Reference Ayton, Wright, Wright and Ayton1994; Evans et al. Reference Evans, Handley, Perham, Over and Thompson2000; Fox & Levav Reference Fox and Levav2004; Girotto & Gonzalez Reference Girotto and Gonzalez2001; Reference Girotto and Gonzalez2002; Johnson-Laird et al. Reference Johnson-Laird, Legrenzi, Girotto, Legrenzi and Caverni1999; Reyna Reference Reyna1991; Tversky & Kahneman Reference Tversky and Kahneman1983; Macchi Reference Macchi2000; Mellers & McGraw Reference Mellers and McGraw1999; Sloman et al. Reference Sloman, Over, Slovak and Stibel2003). Unlike the other theories, it predicts that facilitation should be observable in a variety of different tasks, not just posterior probability problems, when nested set relations are made transparent.
2. Overview of empirical and conceptual issues reviewed
We now turn to an evaluation of these five theoretical frameworks. We evaluate a range of empirical and conceptual issues that bear on the validity of these frameworks.
2.1. Review of empirical literature
The theories are evaluated with respect to the empirical predictions summarized in Table 2. The predictions of each theory derive from (1) the degree of cognitive control attributed to probability judgment (see Table 1), and (2) the proposed cognitive operations that underlie estimates of probability.
Table 2. Empirical predictions of the five theoretical frameworks
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160716045214-30714-mediumThumb-S0140525X07001653_tab2.jpg?pub-status=live)
Note. The predictions of each theory are indicated by an ‘X.’
Theories that adopt a low degree of cognitive control – proposing cognitively impenetrable modules or informationally encapsulated algorithms – restrict Bayesian inference to contexts that satisfy the assumptions of the processing module or algorithm. In contrast, theories that adopt a high degree of cognitive control – appealing to a natural frequency heuristic or a domain general capacity to perform set operations – predict Bayesian inference in a wider range of contexts. The latter theories are distinguished from one another in terms of the cognitive operations they propose: The evolutionary and non-evolutionary natural frequency heuristics depend on structural features of the problem, such as question form and reference class. They imply the accurate encoding and comprehension of natural frequencies and an accurate weighting of the encoded event frequencies to calculate the Bayesian ratio. In contrast, the nested sets theory does not rely on natural frequencies and, instead, predicts facilitation in Bayesian inference, and in a range of other deductive and inductive reasoning tasks, when the set structure of the problem is made transparent, thereby promoting use of elementary set operations and inferences about the logical (i.e., extensional) properties they entail.
2.2. Information format and judgment domain
The preceding review of the literature found that natural frequency formats consistently reduced base-rate neglect relative to probability formats. However, the size of this effect varied considerably across studies (see Table 3).
Table 3. Percent correct for Bayesian inference problems reported in the literature (sample sizes in parentheses)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160716045214-38574-mediumThumb-S0140525X07001653_tab3.jpg?pub-status=live)
Note. Probability problems require that the respondent compute a conditional-event probability from data presented in a non-partitive form, whereas frequency problems include questions that prompt the respondent to evaluate the two terms of the Bayesian ratio and present data that is partitioned into these components.
‡ p > 0.05
Cosmides and Tooby (Reference Cosmides and Tooby1996), for example, observed a 60-point % difference between the proportions of Bayesian responses under natural frequencies versus single-event probabilities, whereas Gigerenzer and Hoffrage (Reference Gigerenzer and Hoffrage1995) reported a difference only half that size. The wide variability in the size of the effects makes it clear that in no sense do natural frequencies eliminate base-rate neglect, though they do reduce it.
Sloman et al. (Reference Sloman, Over, Slovak and Stibel2003) conducted a series of experiments that attempted to replicate the effect sizes observed by the previous studies (e.g., Cosmides & Tooby Reference Cosmides and Tooby1996; Experiment 2, Condition 1). Although Sloman et al. found facilitation with natural frequencies, the size of the effect was smaller than that observed by Cosmides and Tooby: The percent of Bayesian solutions generated under single-event probabilities (20%) was comparable to Cosmides and Tooby (12%), but the percentage of Bayesian answers generated under natural frequencies was smaller (i.e., 72% versus 51% for Sloman et al.). In a further replication, Sloman et al. found that only 31% of their respondents generated the Bayesian solution, a statistically non-significant advantage for natural frequencies.
Evans et al. (Reference Evans, Handley, Perham, Over and Thompson2000, Experiment 1) similarly found only a small effect of information format. They report 24% Bayesian solutions under single-event probabilities and 35% under natural frequencies, a difference that was not reliable.
Brase et al. (Reference Brase, Fiddick and Harries2006) examined whether methodological factors contribute to the observed variability in effect size. They identified two factors that modulate the facilitory effect of natural frequencies in Bayesian inference: (1) the academic selectivity of the university the participants attend, and (2) whether or not the experiment offered a monetary incentive for participation. Experiments whose participants attended a top-tier national university and were paid reported a significantly higher proportion of Bayesian responses (e.g., Cosmides & Tooby Reference Cosmides and Tooby1996) than experiments whose participants attended a second-tier regional university and were not paid (e.g., Brase et al. Reference Brase, Fiddick and Harries2006, Experiments 3 and 4). These results suggest that a higher proportion of Bayesian responses is observed in experiments that (a) select participants with a higher level of general intelligence, as indexed by the academic selectivity of the university the participant attends (Stanovich & West Reference Stanovich and West1998a), and (b) increase motivation by providing a monetary incentive. The former observation is consistent with the view that Bayesian inference depends on domain general cognitive processes to the degree that intelligence is domain general. The latter suggests that Bayesian inference is strategic, and not supported by automatic (e.g., modularized) reasoning processes.
2.3. Question form
One methodological factor that may mediate the effect of problem format is the form of the Bayesian inference question presented to participants (Girotto & Gonzalez Reference Girotto and Gonzalez2001). The Bayesian solution expresses the ratio between the size of the subset of cases in which the hypothesis and observation co-occur and the total number of observations. Thus, it follows that the respondent should be more likely to arrive at this solution when prompted to adopt an outside view by utilizing the sample of category instances presented in the problem (e.g., “Here is a new sample of patients who have obtained a positive test result in routine screening. How many of these patients do you expect to actually have the disease? __ out of __”) versus a question that presents information about category properties (e.g., “… Pierre has a positive reaction to the test …”) and prompts the respondent to adopt an inside view by considering the fact about Pierre to compute a probability estimate. As a result, the form of the question should modulate the observed facilitation.
In the preceding studies, however, information format and judgment domain were confounded with question form: Only problems that presented natural frequencies prompted use of the sample of category instances presented in the problem to compute the two terms of the Bayesian solution, whereas single-event probability problems prompted the use of category properties to compute a conditional probability.
To dissociate these factors, Girotto and Gonzalez (Reference Girotto and Gonzalez2001) proposed that single-event probabilities (e.g., 1%) can be represented as chancesFootnote 4 (e.g., “One chance out of 100”). Under the chance formulation of probability, the respondent can be asked either for the standard conditional probability or for values that correspond more closely to the ratio expressed by Bayes' theorem. The latter question asks the respondent to evaluate the chances that Pierre has a positive test for a particular infection, out of the total chances that Pierre has a positive test, thereby prompting consideration of the chances that Pierre – who could be anyone with a positive test in the sample – has the infection. In addition to encouraging an outside view by prompting the respondent to represent the sample of category instances presented in the problem, this question prompts the computation of the Bayesian ratio in two clearly defined steps: First calculate the overall number of chances where the conditioning event is observed, then compare this quantity to the number of chances where the conditioning event is observed in the presence of the hypothesis.
To evaluate the role of question form in Bayesian inference, Girotto and Gonzalez (Reference Girotto and Gonzalez2001, Study 1) conducted an experiment that manipulated question form independently of information format and judgment domain. The authors presented the following Bayesian inference scenario to 80 college undergraduates of the University of Provence, France:
A person who was tested had 4 chances out of 100 of having the infection. 3 of the 4 chances of having the infection were associated with a positive reaction to the test. 12 of the remaining 96 chances of not having the infection were also associated with a positive reaction to the test (Girotto & Gonzalez Reference Girotto and Gonzalez2001, p. 253).
Half of the respondents were then asked to compute a conditional probability (i.e., “If Pierre has a positive reaction, there will be __ chance(s) out of __ that the infection is associated with his positive reaction”), whereas the remaining respondents were asked to evaluate the ratio of probabilities expressed in the Bayesian solution (i.e., “Imagine that Pierre is tested now. Out of the total 100 chances, Pierre has __ chances of having a positive reaction, __ of which will be associated with having the infection”).
Girotto and Gonzalez (Reference Girotto and Gonzalez2001) found that only 8% of the respondents generated the Bayesian solution when asked to compute a conditional probability, consistent with the earlier literature. But the proportion of Bayesian answers increased to 43% when the question prompted the respondent to evaluate the two terms of the Bayesian solution. The same pattern was observed with the natural frequency format problem. Only 18% of the respondents generated the Bayesian solution when asked to compute a conditional frequency, whereas this proportion increased to 58% when asked to evaluate the two terms separately. This level of performance is comparable to that observed under standard natural frequency formats (e.g., Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995), and supports Girotto and Gonzalez's claim that the two-step question approximates the question asked with standard natural frequency formats. In further support of Girotto and Gonzalez's predictions, there were no reliable effects of information format or judgment domain across all the reported comparisons.
These findings suggest that people are not predisposed against using single-event probabilities but instead appear to be highly sensitive to the form of the question: When asked to reason about category instances to compute the two terms of the Bayesian ratio, respondents were able to draw the normative solution under single-event probabilities. Facilitation in Bayesian inference under natural frequencies need not imply that the mind is designed to process these formats, but instead can be attributed to the facilitory effect of prompting use of the sample of category instances presented in the problem to evaluate the two terms of the Bayesian ratio.
2.4. Reference class
To assess the role of problem structure in Bayesian inference, we review studies that have manipulated structural features of the problem. Girotto and Gonzalez (Reference Girotto and Gonzalez2001) report two experiments that systematically assess performance under different partitionings of the data: Defective frequency partitions and non-partitive frequency problems. Consider the following medical diagnosis problem, which presents natural frequencies under what Girotto and Gonzalez (Reference Girotto and Gonzalez2001, Study 5) term a defective partition:
4 out of 100 people tested were infected. 3 of the 4 infected people had a positive reaction to the test. 84 of the 96 uninfected people did not have a positive reaction to the test. Imagine that a group of people is now tested. In a group of 100 people, one can expect __ individuals to have a positive reaction, __ of whom will have the infection.
In contrast to the standard partitioning of the data under natural frequencies, here the frequency of uninfected people who did not have a positive reaction to the test is reported, instead of the frequency of uninfected, positive reactions. As a result, to derive the Bayesian solution, the first value must be subtracted from the total population of uninfected individuals to obtain the desired value (96 – 84=12), and the result can be used to determine the proportion of infected, positive people out of the total number of people who obtain a positive test (e.g., 3/15=0.2). Although this problem exhibits a partitive structure, Girotto and Gonzalez predicted that the defective partitioning of the data would produce a greater proportion of errors than observed under the standard data partitioning, because the former requires an additional computation. Consistent with this prediction, only 35% of respondents generated the Bayesian solution, whereas 53% did so under the standard data partitioning. Nested set relations were more likely to facilitate Bayesian reasoning when the data were partitioned into the components that are needed to generate the solution.
Girotto and Gonzalez (Reference Girotto and Gonzalez2001, Study 6) also assessed performance under natural frequency formats that were not partitioned into nested set relations (i.e., unpartitioned frequencies). As in the case of standard natural frequency format problems (e.g., Cosmides & Tooby Reference Cosmides and Tooby1996), these multiple-sample problems employed natural frequencies and prompted the respondent to compute the two terms of the Bayesian solution.Footnote 5 Such a problem must be treated in the same way as a single-event probability problem (i.e., using the conditional probability and additivity laws) to determine the two terms of the Bayesian solution. Girotto and Gonzalez therefore predicted that performance under multiple samples would be poor, approximating the performance observed under standard probability problems. As predicted, none of the respondents generated the Bayesian solution under the multiple sample or standard single-event probability frames. Natural frequency formats facilitate Bayesian inference only when they partition the data into components needed to draw the Bayesian solution.
Converging evidence is provided by Macchi (Reference Macchi2000), who presented Bayesian inference problems in either a partitive or non-partitive form. Macchi found that only 3% of respondents generated the Bayesian solution when asked to evaluate the two terms of the Bayesian ratio with non-partitive frequency problems. Similarly, only 6% of the respondents generated the Bayesian solution when asked to compute a conditional probability under non-partitive probability formats (see also Sloman et al. Reference Sloman, Over, Slovak and Stibel2003, Experiment 4). But when presented under a partitive formulation and asked to evaluate the two terms of the Bayesian ratio the proportions increased to 40% under partitive natural frequency formats, 33% under partitive single-event probabilities, and 36% under the modified partitive single-event probability problems. The findings reinforce the nested sets view that information structure is the factor determining predictive accuracy.
To further explore the contribution of information structure and question form in Bayesian inference, Sloman et al. (Reference Sloman, Over, Slovak and Stibel2003) assessed performance using a conditional chance question. In contrast to the standard conditional probability question that presents information about a particular individual (e.g., “Pierre has a positive reaction to the test”), their conditional probability question asked the respondent to evaluate “the chance that a person found to have a positive test result actually has the disease.” This question requests the probability of an unknown category instance and therefore prompts the respondent to consult the data presented in the problem to assess the probability that this person – who could be any randomly chosen person with a positive result in the sample – has the disease. In Experiment 1, Sloman et al. looked for facilitation in Bayesian inference on a partitive single-event probability problem by prompting use of the sample of category instances presented in the problem to compute a conditional probability, as the nested sets hypothesis predicts. Forty-eight percent of the 48 respondents tested generated the Bayesian solution, demonstrating that making partitive structure transparent facilitates Bayesian inference.
In summary, the reviewed findings suggest that when the data are partitioned into the components needed to arrive at the solution and participants are prompted to use the sample of category instances in the problem to compute the two terms of the Bayesian ratio, the respondent is more likely to (1) understand the question, (2) see the underlying nested set structure by partitioning the data into exhaustive subsets, and (3) select the pieces of evidence that are needed for the solution. According to the nested sets theory, accurate probability judgments derive from the ability to perform elementary set operations whose computations are facilitated by external cues (for recent developmental evidence, see Girotto & Gonzalez, in press).
2.5. Diagrammatic representations
Sloman et al. (Reference Sloman, Over, Slovak and Stibel2003, Experiment 2) explored whether Euler circles, which were employed to construct a nested set structure for standard non-partitive single-event probability problems (e.g., Cosmides & Tooby Reference Cosmides and Tooby1996), would facilitate Bayesian inference (see Fig. 1 here). These authors found that 48% of the 25 respondents tested generated the Bayesian solution when presented non-partitive single-event probability problems with an Euler diagram that depicted the underlying nested set relations. This finding demonstrates that the set structure of standard non-partitive single-event probability problems can be represented by Euler diagrams to produce facilitation. Supporting data can be found in Yamagishi (Reference Yamagishi2003) who used diagrams to make nested set relations transparent in other inductive reasoning problems. Similar evidence is provided by Bauer and Johnson-Laird (Reference Bauer and Johnson-Laird1993) in the context of deductive reasoning.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160716045214-83018-mediumThumb-S0140525X07001653_fig1g.jpg?pub-status=live)
Figure 1. A diagrammatic representation of Bayes theorem: Euler circles (Sloman et al., Reference Sloman, Over, Slovak and Stibel2003).
2.6. Accuracy of frequency judgments
Theories based on natural frequency representations (i.e., the mind-as-Swiss-army-knife, natural frequency algorithm, natural frequency heuristic, and non-evolutionary natural frequency heuristic theories) propose that “the mind is a frequency monitoring device” and that the cognitive algorithm that computes the Bayesian ratio encodes and processes event frequencies in naturalistic settings (Gigerenzer Reference Gigerenzer, Keren and Lewis1993, p. 300). The literature that evaluates the encoding and retrieval of event frequencies is large and extensive and includes assessments of frequency judgments under well-controlled laboratory settings based on relatively simple and distinct stimuli (e.g., letters, pairs of letters, or words), and naturalistic settings in which respondents report the frequency of their own behaviors (e.g., the medical diagnosis of patients). Laboratory studies tend to find that frequency judgments are surprisingly accurate (for a recent review, see Zacks & Hasher Reference Zacks, Hasher, Sedlmeier and Betsch2002), whereas naturalistic studies often find systematic errors in frequency judgments (see Bradburn et al. Reference Bradburn, Rips and Shevell1987). Recent efforts have been made to integrate these findings under a unified theoretical framework (e.g., Schwartz & Sudman Reference Schwartz and Sudman1994; Schwartz & Wanke Reference Schwartz, Wanke, Sedlmeier and Betsch2002; Sedlmeier & Betsch Reference Sedlmeier and Betsch2002).
Are frequency judgments relatively accurate under the naturalistic settings described by standard Bayesian inference problems? Bayesian inference problems tend to involve hypothetical situations that, if real, would be based on autobiographical memories encoded under naturalistic conditions, such as the standard medical diagnosis problem in which a particular set of patients is hypothetically encountered (cf. Sloman & Over Reference Over and Over2003). Hence, the present review focuses on the accuracy of frequency judgments for the autobiographical events alluded to by standard Bayesian inference problems (see sects. 2.1, 2.2, and 2.3) to assess whether Bayesian inference depends on the accurate encoding of autobiographical events.
Gluck and Bower (Reference Gluck and Bower1988) conducted an experiment that employed a learning paradigm to assess the accuracy of frequency judgments in medical diagnosis. The respondents in the experiment learned to diagnose a rare (25%) or a common (75%) disease on the basis of four potential symptoms exhibited by the patient (e.g., stomach cramps, discolored gums). During the learning phase, the respondents diagnosed 250 hypothetical patients and in each case were provided feedback on the accuracy of their diagnosis. After the learning phase, the respondents estimated the relative frequency of patients who had the diseases given each symptom. Gluck and Bower found that relative frequency estimates of the disease were determined by the diagnosticity of the symptom (the degree to which the respondent perceived that the symptom provided useful information in diagnosing the disease) and not the base-rate frequencies of the disease. These findings were replicated by Estes et al. (Reference Estes, Campbell, Hatsopoulos and Hurwitz1989, Experiment 1) and Nosofsky et al. (Reference Nosofsky, Kruschke and McKinley1992, Experiment 1).
Bradburn et al. (Reference Bradburn, Rips and Shevell1987) evaluated the accuracy of autobiographical memory for event frequencies by employing a range of surveys that assessed quantitative facts, such as “During the last two weeks, on days when you drank liquor, about how many drinks did you have?” These questions require the simple recall of quantitative facts, in which the respondent “counts up how many individuals fall within each category” (Cosmides & Tooby Reference Cosmides and Tooby1996, p. 60). Recalling the frequency of drinks consumed over the last two weeks, for example, is based on counting the total number of individual drinking occasions stored in memory.
Bradburn et al. (Reference Bradburn, Rips and Shevell1987) found that autobiographical memory for event frequencies exhibits systematic errors characterized by (a) the failure to recall the entire event or the loss of details associated with a particular event (e.g., Linton Reference Linton, Norman and Rumelhart1975; Wagenaar Reference Wagenaar1986), (b) the combining of similar distinct events into a single generalized memory (e.g., Linton Reference Linton, Norman and Rumelhart1975; Reference Linton and Neisser1982), or (c) the inclusion of events that did not occur within the reference period specified in the question (e.g., Pillemer et al. Reference Pillemer, Rhinehart and White1986). As a result, Bradburn et al. propose that the observed frequency judgments do not reflect the accurate encoding of event frequencies, but instead entail a more complex inferential process that typically operates on the basis of incomplete, fragmentary memories that do not preserve base-rate frequencies.
These findings suggest that the observed facilitation in Bayesian inference under natural frequencies cannot be explained by an (evolved) capacity to encode natural frequencies. Apparently, people don't have that capacity.
2.7. Comprehension of formats
Advocates of the nested sets view have argued that the facilitation of Bayesian inference under natural frequencies can be fully explained via elementary set operations that deliver the same result as Bayes' theorem, without appealing to (an evolved) capacity to process natural frequencies (e.g., Johnson-Laird et al. Reference Johnson-Laird, Legrenzi, Girotto, Legrenzi and Caverni1999). The question therefore arises whether the ease of processing natural frequencies goes beyond the reduction in computational complexity of Bayes' theorem that they provide (Brase Reference Brase2002a). To assess this issue, we review evidence that evaluates whether natural frequencies are understood more easily than single-event probabilities.
Brase (Reference Brase2002b) conducted a series of experiments to evaluate the relative clarity and ease of understanding a range of statistical formats, including natural frequencies (e.g., 1 out of 10) and percentages (e.g., 10%). Brase distinguished natural frequencies that have a natural sampling structure (e.g., 1 out 10 have the property, 9 out of 10 do not) from “simple frequencies” that refer to single numerical relations (e.g., 1 out of 10 have the property). This distinction, however, is not entirely consistent with the literature, as natural frequency theorists have often used single numerical statements for binary hypotheses to express natural frequencies (e.g., Zue & Gigerenzer Reference Gigerenzer2006). In any case, for binary hypotheses the natural sampling structure can be directly inferred from simple frequencies. If we observe, for example, that I win the weekly poker game “1 out of 10 nights,” we can infer that I lose “9 out of 10 nights” and construct a natural sampling structure that represents the size of the reference class and is arranged into subset relations. Thus, single numerical statements of this type have a natural sampling structure, and, therefore, we refer to Brase's “simple frequencies” as natural frequencies in the following discussion.
Percentages express single-event probabilities in that they are normalized to an arbitrary reference class (e.g., 100) and can refer to the likelihood of a single-event (Brase Reference Brase2002b; Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995). We therefore examine whether natural frequencies are understood more easily and have a greater impact on judgment than percentages.
To test this prediction, Brase (Reference Brase2002b, Experiment 1) assessed the relative clarity of statistical information presented in a natural frequency format versus percentage format at small, intermediate, and large magnitudes. Respondents received four statements in one statistical format, each statement at a different magnitude, and rated the clarity, impressiveness, and “monetary pull” of the presented statistics according to a 5-point scale. Example questions are shown in Table 4.
Table 4. Example questions presented by Brase (2002b)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151014111921982-0389:S0140525X07001653_tab4.gif?pub-status=live)
Brase (Reference Brase2002b) found that across all statements and magnitudes both natural frequencies and percentages were rated as “Very Clear,” with average ratings of 3.98 and 3.89, respectively. These ratings were not reliably different, demonstrating that percentages are perceived as clearly and are as understandable as natural frequencies. Furthermore, Brase found no reliable differences in the impressiveness ratings (from question 2) of natural frequencies and percentages at intermediate and large statistical magnitudes, suggesting that these formats are typically viewed as equally impressive. A significant difference between these formats was observed, however, at low statistical magnitudes: On average, natural frequencies were rated as “Impressive,” whereas percentages were viewed as “Fairly Impressive.” The observed difference in the impressiveness ratings at low statistical magnitudes did not accord with the respondent's monetary pull ratings – their willingness to allocate funds to support research studying the issue at hand – which were approximately equal for the two formats across all statements and magnitudes. Hence the difference in the impressiveness ratings at low magnitudes does not denote differences in people's willingness to act.
These data are consistent with the conclusion that percentages and natural frequency formats (a) are perceived equally clearly and are equally understandable; (b) are typically viewed as equally impressive (i.e., at intermediate and large statistical magnitudes); and (c) have the same degree of impact on behavior. Natural frequency formats do apparently increase the perceptual contrast of small differences. Overall, however, the two formats are perceived similarly, suggesting that the mind is not designed to process natural frequency formats over single-event probabilities.
2.8. Are base-rates and likelihood ratios equally weighted?
Does the facilitation of Bayesian inference under natural frequencies entail that the mind naturally incorporates this information according to Bayes' theorem, or that elementary set operations can be readily computed from problems that are structured in a partitive form? Natural frequencies preserve the sample size of the reference class and are arranged into subset relations that preserve the base-rates. As a result, judgments based on these formats will entail the sample and effect sizes; the respondent need not calculate them. To assess whether the cognitive operations that underlie Bayesian inference are consistent with the application of Bayes' theorem, studies that evaluate how the respondent derives Bayesian solutions are reviewed.
Griffin and Buehler (Reference Griffin and Buehler1999) employed the classic lawyer-engineer paradigm developed by Kahneman and Tversky (Reference Kahneman and Frederick1973), involving personality descriptions randomly drawn from a population of either 70 engineers and 30 lawyers or 30 engineers and 70 lawyers. Participants' task in this study is to predict whether the description was taken from an engineer or a lawyer (e.g., “My probability that this man is one of the engineers in this sample is __%”). Kahneman and Tversky's original findings demonstrated that the respondent consistently relied upon category properties (i.e., how representative the personality description is of an engineer or a lawyer) to guide their judgment, without fully incorporating information about the population base-rates (for a review, see Koehler Reference Koehler1996). However, when the base-rates were presented via a counting procedure that induces a frequentist representation of each population and the respondent is asked to generate a natural frequency prediction (e.g., “I would expect that __ out of the 10 descriptions would be engineers”), base-rate usage increased (Gigerenzer et al. Reference Gigerenzer, Hell and Blank1988).
To assess whether the observed increase in base-rate usage reflects the operation of a Bayesian algorithm that is designed to process natural frequencies, Griffin and Buehler (Reference Griffin and Buehler1999) evaluated whether participants derived the solution by utilizing event frequencies according to Bayes' theorem. This was accomplished by first collecting estimates of each of the components of Bayes' theorem in odds formFootnote 6: Respondents estimated (a) the probability that the personality description was taken from the population of engineers or lawyers; (b) the degree to which the personality description was representative of these populations; and (c) the perceived population base-rates. Each of these estimates was then divided by their compliment to yield the posterior odds, likelihood ratio, and prior odds, respectively. Theories based on the Bayesian ratio predict that under frequentist representations, the likelihood ratios and prior odds will be weighted equally (Griffin & Buehler Reference Griffin and Buehler1999).
Griffin and Buehler evaluated this prediction by conducting a regression analysis using the respondent's estimated likelihood ratios and prior odds to predict their posterior probability judgments (cf. Keren & Thijs Reference Keren and Thijs1996). Consistent with the observed increase in base-rate usage under frequentist representations (Gigerenzer et al. Reference Gigerenzer, Hell and Blank1988), Griffin and Buehler (Reference Griffin and Buehler1999, Experiment 3b) found that the prior odds (i.e., the base-rates) were weighted more heavily than the likelihood ratios, with corresponding regression weights (β values) of 0.62 and 0.39. The failure to weight them equally violates Bayes' theorem. Although frequentist representations may enhance base-rate usage, they apparently do not induce the operation of a mental analogue of Bayes' theorem.
Further support for this conclusion is provided by Evans et al. (Reference Evans, Handley, Over and Perham2002) who conducted a series of experiments demonstrating that probability judgments do not reflect equal weighting of the prior odds and likelihood ratio. Evans et al. (Reference Evans, Handley, Over and Perham2002, Experiment 5) employed a paradigm that extended the classic lawyer-engineer experiments by assessing Bayesian inference under conditions where the base-rates are supplied by commonly held beliefs and only the likelihood ratios are explicitly provided. These authors found that when prior beliefs about the base-rate probabilities were rated immediately before the presentation of the problem, the prior odds (i.e., the base-rates) were weighted more heavily than the likelihood ratios, with corresponding regression weights (β values) of 0.43 and 0.19.
Additional evidence supporting this conclusion is provided by Kleiter et al. (Reference Kleiter, Krebs, Doherty, Gavaran, Chadwick and Brake1997) who found that participants assessing event frequencies in a medical diagnosis setting employed statistical evidence that is irrelevant to the calculation of Bayes' theorem. Kleiter et al. (Reference Kleiter, Krebs, Doherty, Gavaran, Chadwick and Brake1997, Experiment 1) presented a list of event frequencies to respondents, which included those that were necessary for the calculation of Bayes' theorem (e.g., Pr(D | H)) and other statistics that were irrelevant (e.g., Pr(~D)). Participants were then asked to identify the event frequencies that were needed to diagnose the probability of the disease, given the symptom (i.e., the posterior probability). Of the four college faculty and 26 graduate students tested, only three people made the optimal selection by identifying only the event frequencies required to calculate Bayes' theorem.
These data suggest that the mind does not utilize a Bayesian algorithm that “maps frequentist representations of prior probabilities and likelihoods onto a frequentist representation of a posterior probability in a way that satisfies the constraints of Bayes' theorem” (Cosmides & Tooby Reference Cosmides and Tooby1996, p. 60). Importantly, the findings that the prior odds and likelihood ratio are not equally weighted according to Bayes' theorem (Evans et al. Reference Evans, Handley, Over and Perham2002; Griffin & Buehler Reference Griffin and Buehler1999) imply that Bayesian inference does not rely on Bayesian computations per se.
Thus, the findings are inconsistent with the mind-as-Swiss-army-knife, natural frequency algorithm, natural frequency heuristic, and non-evolutionary natural frequency heuristic theories, which propose that coherent probability judgment reflects the use of the Bayesian ratio. The finding that base-rate usage increases under frequentist representations (Evans et al. Reference Evans, Handley, Over and Perham2002; Griffin & Buehler Reference Griffin and Buehler1999) supports the proposal that the facilitation in Bayesian inference from natural frequency formats is due to the property of these formats to induce a representation of category instances that preserves the sample and effect sizes, thereby clarifying the underlying set structure of the problem and making the relevance of base-rates more obvious without providing an equation that generates Bayesian quantities.
2.9. Convergence with disparate data
A unique characteristic of the dual process position is that it predicts that nested sets should facilitate reasoning whenever people tend to rely on associative rather than extensional, rule-based processes; facilitation should be observed beyond the context of Bayesian probability updating. The natural frequency theories expect facilitation only in the domain of probability estimation.
In support of the nested sets position, facilitation through nested set representations has been observed in a number of studies of deductive inference. Grossen and Carnine (Reference Grossen and Carnine1990) and Monaghan and Stenning (Reference Monaghan, Stenning, Gernsbacher and Derry1998) reported significant improvement in syllogistic reasoning when participants were taught using Euler circles. The effect was restricted to participants who were “learning impaired” (Grossen & Carnine Reference Grossen and Carnine1990) or had a low GRE score (Monaghan & Stenning Reference Monaghan, Stenning, Gernsbacher and Derry1998). Presumably, those that did not show improvement did not require the Euler circles because they were already representing the nested set relations.
Newstead (Reference Newstead1989, Experiment 2) evaluated how participants interpreted syllogisms when represented by Euler circles versus quantified statements. Newstead found that although Gricean errors of interpretation occurred when syllogisms were represented by Euler circles and quantified statements, the proportion of conversion errors, such as converting “Some A are not B” to “Some B are not A,” was significantly reduced in the Euler circle task. For example, less than 5% of the participants generated a conversion error for “Some … not” on the Euler circle task, whereas this error occurred on 90% of the responses for quantified statements.
Griggs and Newstead (Reference Griggs and Newstead1982) tested participants on the THOG problem, a difficult deductive reasoning problem involving disjunction. They obtained a substantial amount of facilitation by making the problem structure explicit, using trees. According to the authors, the structure is normally implicit due to negation and the tree structure facilitates performance by cuing formation of a mental model similar to that of nested sets.
Facilitation has also been obtained by making extensional relations more salient in the domain of categorical inductive reasoning. Sloman (Reference Sloman1998) found that people who were told that all members of a superordinate have some property (e.g., all flowers are susceptible to thrips), did not conclude that all members of one of its subordinates inherited the property (e.g., they did not assert that this guaranteed that all roses are susceptible to thrips). This was true even for those people who believed that roses are flowers. But if the assertion that roses are flowers was included in the argument, then people did abide by the inheritance rule, assigning a probability of one to the statement about roses. Sloman argued that this occurred because induction is mediated by similarity and not by class inclusion, unless the categorical – or set – relation is made transparent within the statement composing the argument (for an alternative interpretation, see Calvillo & Revlin Reference Calvillo and Revlin2005).
Facilitation in other types of probability judgment can also be obtained by manipulating the salience and structure of set relations. Sloman et al. (Reference Sloman, Over, Slovak and Stibel2003) found that almost no one exhibited the conjunction fallacy when the options were presented as Euler circles, a representation that makes set relations explicit. Fox and Levav (Reference Fox and Levav2004) and Johnson-Laird et al. (Reference Johnson-Laird, Legrenzi, Girotto, Legrenzi and Caverni1999) also improved judgments on probability problems by manipulating the set structure of the problem.
2.10. Empirical summary and conclusions
In summary, the empirical review supports five main conclusions. First, the facilitory effect of natural frequencies on Bayesian inference varied considerably across the reviewed studies (see Table 3), potentially resulting from differences in the general intelligence level and motivation of participants (Brase et al. Reference Brase, Fiddick and Harries2006). These findings support the nested sets hypothesis to the degree that intelligence and motivation reflect the operation of domain general and strategic – rather than automatic (i.e., modular) – cognitive processes.
Second, questions that prompt use of category instances and divide the solution into the sets needed to compute the Bayesian ratio, facilitate probability judgment. This suggests that facilitation depends on cues to the set structure of the problem rather than (an evolved) capacity to process natural frequencies. In further support of this conclusion, partitioning the data into nested sets facilitates Bayesian inference regardless of whether natural frequencies or single-event probabilities are employed (see Table 5).
Table 5. Percent correct for Bayesian inference problems reported in the literature (sample sizes in parentheses)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160716045214-10271-mediumThumb-S0140525X07001653_tab5.jpg?pub-status=live)
Note. Studies that present questions that require the respondent to compute a conditional-event probability are indicated by*. The remaining studies present questions that prompt the respondent to compute the two terms of the Bayesian solution.
Third, frequency judgments are guided by inferential strategies that reflect incomplete, fragmentary memories that do not entail the base-rates (e.g., Bradburn et al. Reference Bradburn, Rips and Shevell1987; Gluck & Bower Reference Gluck and Bower1988). This suggests that Bayesian inference does not derive from the accurate encoding and retrieval of natural frequencies. In addition, natural frequencies and single-event probabilities are rated similarly in their perceived clarity, understandability, and impact on the respondent's behavior (Brase Reference Brase2002b), further suggesting that the mind does not embody inductive reasoning mechanisms (that are designed) to process natural frequencies.
Fourth, people (a) do not accurately weight and combine event frequencies, and (b) utilize event frequencies that are irrelevant in the calculation of Bayes' theorem (e.g., Griffin & Buehler Reference Griffin and Buehler1999; Kleiter et al. Reference Kleiter, Krebs, Doherty, Gavaran, Chadwick and Brake1997). This suggests that the cognitive operations that underlie Bayesian inference do not conform to Bayes' theorem. Furthermore, base-rate usage increases under frequentist representations (e.g., Griffin & Buehler Reference Griffin and Buehler1999), suggesting that facilitation results from the property of natural frequencies to represent the sample and effect sizes, which highlight the set structure of the problem and make transparent what is relevant for problem solving.
Finally, nested set representations facilitate reasoning in a range of classic deductive and inductive reasoning tasks. This supports the nested set hypothesis that the mind embodies a domain general capacity to perform elementary set operations and that these operations can be induced by cues to the set structure of the problem to facilitate reasoning in any context where people tend to rely on associative rather than extensional, rule-based processes.
3. Conceptual issues
This section provides a conceptual analysis that addresses (1) the plausibility of the natural frequency assumptions, and (2) whether natural frequency representations support properties that are central to human inductive reasoning competence, including reasoning about statistical independence, estimating the probability of unique events, and reasoning on the basis of similarity, analogy, association, and causality.
3.1. Plausibility of natural frequency assumptions
The natural sampling framework was established by the seminal work of Kleiter (Reference Kleiter, Fischer and Laming1994), who assessed “the correspondence between the constraints of the statistical model of natural sampling on the one hand, and the constraints under which human information is acquired on the other” (p. 376). Kleiter proved that under natural sampling and other conditions (e.g., independent identical sampling), the frequencies corresponding to the base-rates are redundant and can be ignored. Thus, conditions of natural sampling can simplify the calculation of the relevant probabilities and, as a consequence, facilitate Bayesian inference (see Note 2 of the target article). Kleiter's computational argument does not appeal to evolution and was advanced with careful consideration of the assumptions upon which natural sampling are based. Kleiter noted, for example, that the natural sampling framework (a) is limited to hypotheses that are mutually exclusive and exhaustive, and (b) depends on collecting a sufficiently large sample of event frequencies to reliably estimate population parameters.
Although people may sometimes treat hypotheses as mutually exclusive (e.g., “this person is a Democrat, so they must be anti-business”), this constraint is not always satisfied: many hypotheses are nested (e.g., “she has breast cancer” vs. “she has a particular type of breast cancer”) or overlapping (e.g., “this patient is anxious or depressed”). People's causal models typically provide a wealth of knowledge about classes and properties, allowing consideration of many kinds of hypotheses that do not necessarily come in mutually exclusive, exhaustive sets. As a consequence, additional principles are needed to broaden the scope of the natural sampling framework to address probability estimates drawn from hypotheses that are not mutually exclusive and exhaustive. In this sense, the nested sets theory is more general: It can represent nested and overlapping hypotheses by taking the intersection (e.g., “she has breast cancer and it is type X”) and union (e.g., “the patient is anxious or depressed) of sets, respectively.
As Kleiter further notes, inferences about hypotheses from encoded event frequencies are warranted to the extent that the sample is sufficiently large and provides a reliable estimate of the population parameters. The efficacy of the natural sampling framework therefore depends on establishing (1) the approximate number of event frequencies that are needed for a reliable estimate, (2) whether this number is relatively stable or varies across contexts, and (3) whether or not people can encode and retain the required number of events.
3.2. Representing qualitative relations
In contrast to single-event probabilities, natural frequencies preserve information about the size of the reference class and, as a consequence, do not directly indicate whether an observation and hypothesis are statistically independent. For example, probability judgments drawn from natural frequencies do not tell us that a symptom present in (a) 640 out of 800 patients with a particular disease and (b) 160 out of 200 patients without the disease, is not diagnostic because 80% have the symptom in both cases (Over Reference Over2000a; Reference Over2000b; Over & Green Reference Over and Green2001; Sloman & Over Reference Over and Over2003). Thus, probability estimates drawn from natural frequencies do not capture important qualitative properties.
Furthermore, in contrast to the cited benefits of non-normalized representations (e.g., Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995), normalization may serve to simplify a problem. For example, is someone offering us the same proportion if he tries to pay us back with 33 out of 47 nuts he has gathered (i.e., 70%), after we have earlier given him 17 out of 22 nuts we have gathered (i.e., 77%)? This question is trivial after normalization, as it is transparent that 70 out of 77 out of 100 are nested sets (Over Reference Over and Roberts2007).
3.3. Reasoning about unique events and associative processes
One objection to the claim that the encoding of natural frequencies supports Bayesian inference is that intuitive probability judgment often concerns (a) beliefs regarding single events, or (b) the assessment of hypotheses about novel or partially novel contexts, for which prior event frequencies are unavailable. For example, the estimated likelihoods of specific outcomes are often based on novel and unique one-time events, such as the likelihood that a particular constellation of political interests will lead to a coalition. Hence, Kahneman and Tversky (Reference Kahneman and Frederick1996, p. 589) argue that the subjective degree of belief in hypotheses derived from single events or novel contexts “cannot be generally treated as a random sample from some reference population, and their judged probability cannot be reduced to a frequency count.”
Furthermore, theories based on natural frequency representations do not allow for the widely observed role of similarity, analogy, association, and causality in human judgment (for recent reviews of the contribution of these factors, see Gilovich et al. Reference Gilovich, Griffin and Kahneman2002 and Sloman Reference Sloman2005). The nested sets hypothesis presupposes these determinants of judgment by appealing to a dual-process model of judgment (Evans & Over Reference Evans and Over1996; Sloman Reference Sloman1996a; Stanovich & West Reference Stanovich2000), a move that natural frequency theorists are not (apparently) willing to make (Gigerenzer & Regier Reference Gigerenzer and Regier1996). The dual-process model attributes responses based on associative principles, such as similarity, or responses based on retrieval from memory, such as analogy, to a primitive associative judgment system. It attributes responses based on more deliberative processing involving rule-based inference, such as the elementary set operations that respect the logic of set inclusion and facilitate Bayesian inference, to a second deliberative system. However, this second system is not limited to analyzing set relations. It can also, under the right conditions, do the kinds of structural analyses required by analogical or causal reasoning.
Within this framework, natural frequency approaches can be viewed as making claims about rule-based processes (i.e., the application of a psychologically plausible rule for calculating Bayesian probabilities), without addressing the role of associative processes in Bayesian inference. In light of the substantial literatures that demonstrate the role of associative processes in human judgment, Kahneman and Tversky (Reference Kahneman and Frederick1996, p. 589) conclude, “there is far more to inductive reasoning and judgment under uncertainty than the retrieval of learned frequencies.”
4. Summary and conclusions
The conclusions drawn from the diverse body of empirical and conceptual issues addressed by this review consistently challenge theories of Bayesian inference that depend on natural frequency representations (see Table 2), demonstrating that coherent probability estimates are not derived according to an equational form for calculating Bayesian posterior probabilities that requires the use of such representations.
The evidence instead supports the nested sets hypothesis that judgmental errors and biases are attenuated when Bayesian inference problems are represented in a way that reveals underlying set structure, thereby demonstrating that the cognitive capacity to perform elementary set operations constitutes a powerful means of reducing associative influences and facilitating probability estimates that conform to Bayes' theorem. An appropriate representation can induce people to substitute reasoning by rules with reasoning by association. In particular, the review demonstrates that judgmental errors and biases were attenuated when (a) the question induced an outside view by prompting the respondent to utilize the sample of category instances presented in the problem, and when (b) the sample of category instances were represented in a nested set structure that partitioned the data into the components needed to compute the Bayesian solution.
Although we disagree with the various theoretical interpretations that could be attributed to natural frequency theorists regarding the architecture of mind, we do believe that they have focused on and enlightened us about an important phenomenon. Frequency formulations are a highly efficient way to obtain drastically improved reasoning performance in some cases. Not only is this an important insight to improve and teach reasoning, but it also focuses theorists on a deep and fundamental problem: What are the conditions that compel people to overcome their natural associative tendencies in order to reason extensionally?
ACKNOWLEDGMENTS
This work was supported by National Science Foundation Grants DGE-0536941 and DGE-0231900 to Aron K. Barbey. We are grateful to Gary Brase, Jonathan Evans, Vittorio Girotto, Philip Johnson-Laird, Gernot Kleiter, and David Over for their very helpful comments on prior drafts of this paper. Barbey would also like to thank Lawrence W. Barsalou, Sergio Chaigneau, Brian R. Cornwell, Pablo A. Escobedo, Shlomit R. Finkelstein, Carla Harenski, Corey Kallenberg, Patricia Marsteller, Robert N. McCauley, Richard Patterson, Diane Pecher, Philippe Rochat, Ava Santos, W. Kyle Simmons, Irwin Waldman, Christine D. Wilson, and Phillip Wolff for their encouragement and support while writing this paper.