Hostname: page-component-745bb68f8f-kw2vx Total loading time: 0 Render date: 2025-02-11T07:27:54.159Z Has data issue: false hasContentIssue false

The role of representation in Bayesian reasoning: Correcting common misconceptions

Published online by Cambridge University Press:  29 October 2007

Gerd Gigerenzer
Affiliation:
Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin, Germanygigerenzer@mpib-berlin.mpg.de
Ulrich Hoffrage
Affiliation:
Ecole des Haute Etudes Commerciales (HEC), University of Lausanne, Batiment Internef, 1015 Lausanne, Switzerland. Ulrich.Hoffrage@unil.ch
Rights & Permissions [Opens in a new window]

Abstract

The terms nested sets, partitive frequencies, inside-outside view, and dual processes add little but confusion to our original analysis (Gigerenzer & Hoffrage 1995; 1999). The idea of nested set was introduced because of an oversight; it simply rephrases two of our equations. Representation in terms of chances, in contrast, is a novel contribution yet consistent with our computational analysis – it uses exactly the same numbers as natural frequencies. We show that non-Bayesian reasoning in children, laypeople, and physicians follows multiple rules rather than a general-purpose associative process in a vaguely specified “System 1.” It is unclear what the theory in “dual process theory” is: Unless the two processes are defined, this distinction can account post hoc for almost everything. In contrast, an ecological view of cognition helps to explain how insight is elicited from the outside (the external representation of information) and, more generally, how cognitive strategies match with environmental structures.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2007

For many years researchers believed that people are “not Bayesian at all” (Kahneman & Tversky Reference Kahneman and Tversky1972, p. 450) and that “the genuineness, the robustness, and the generality of the base-rate fallacy are matters of established fact” (Bar-Hillel Reference Bar-Hillel1980, p. 215). In 1995, however, we showed that Bayesian reasoning depends on and can be improved by external representations (Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995). This ecological approach led to practical applications in medicine, law, and education; natural frequency representations are now part of evidence-based medicine, high-school mathematics textbooks, and cancer-screening information brochures, helping people to understand risks (Gigerenzer Reference Gigerenzer, Simon and Schuster2002; Hoffrage et al. Reference Hoffrage, Lindsey, Hertwig and Gigerenzer2000).

Our 1995 article was about the general question of how various external representations facilitate Bayesian computations, not about natural frequencies versus single-event probabilities, as Barbey & Sloman (B&S) suggest. It contained four main predictions (Gigerenzer & Hoffrage Reference Gigerenzer and Hoffrage1995, pp. 691–92):

  • Prediction 1: Natural frequencies (standard frequency formats) elicit a higher proportion of Bayesian algorithms than standard probability formats do.

  • Prediction 2: Short probability formats elicit a higher proportion of Bayesian algorithms than standard probability formats do.

  • Prediction 3: Natural frequencies, whether in the standard or short format, elicit the same proportion of Bayesian algorithms.

  • Prediction 4: Relative frequencies elicit the same (small) proportion of Bayesian algorithms as standard probability formats do.

These predictions follow from Equations 1 to 3 in Gigerenzer and Hoffrage (Reference Gigerenzer and Hoffrage1995). If information is presented in the standard probability format or in normalized (relative) frequencies, then the following computations are necessary (H=hypothesis, D=data):

(1)
$$p\lpar H\,\vert\, D\rpar = p\lpar H\rpar p\lpar D\,\vert\, H\rpar /[ p\lpar H\rpar p\lpar D\,\vert\, H\rpar + p\lpar\!\! -\!H\rpar p\lpar D\,\vert\, \lpar\!\! -\!\!H\rpar]$$

If information is instead represented in natural frequencies (standard or short format), then Bayesian computations reduce to:

(2)
$$p\lpar H\,\vert\, D\rpar = a/\lpar a + b\rpar$$

Here, a and b are natural frequencies. If probabilities are presented in short probability format, then the computations reduce to:

(3)
$$p\lpar H\,\vert\, D\rpar = p\lpar D \& H\rpar /p\lpar D\rpar$$

B&S mistakenly present (i) experiments reporting facilitation with probability representations (as in our Prediction 2) and (ii) experiments finding no facilitation with relative frequencies (exactly our Prediction 4) as if these were contradicting or going beyond our position, without making any mention of our Predictions 2, 3, and 4. The upshot is that the “nested set structure” explicit in our Equations 2 and 3 – the observation that the numerator is a subset of the denominator – is then presented as a new, alternative explanation. The predictions in B&S's Table 2 are based on the erroneous idea that our computational analysis was restricted to natural frequencies, as is the claim in their Table 1 that our computational analysis was only about a “cognitive process uniquely sensitive to natural frequency formats.” In the remainder of this comment, we will clarify the key ideas for the reader.

Table 1. Bayesian strategies and cognitive shortcuts for approximating Bayes' rule. Based on the experimental evidence in Gigerenzer and Hoffrage (1995, pp. 689–691). n(D&H) is the natural frequency of D&H cases. We suggest that Barbey & Sloman consider these rules as mechanisms for their System 2, to be interpreted as an adaptive toolbox rather than a single, general-purpose calculus

Table 2. Six cognitive rules underlying non-Bayesian judgments. Values are percentages of people classified as using a rule among all non-Bayesian judgments. The experiments with children (grades 4, 5, and 6) were conducted by Zhu & Gigerenzer (Reference Zhu and Gigerenzer2006), with laypeople (students with median age 21–22) by Gigerenzer & Hoffrage (Reference Gigerenzer and Hoffrage1995), with medical students (median age 25) by Hoffrage et al. (Reference Hoffrage, Lindsey, Hertwig and Gigerenzer2000), and with physicians by Hoffrage & Gigerenzer (Reference Hoffrage and Gigerenzer1998). Cognitive rules are reported here for natural frequencies and conditional probabilities (standard format) only; for other representations and how rules depend on representations, see the original studies. We suggest that Barbey & Sloman consider these rules as mechanisms for their System 1, to be interrupted as an adaptive toolbox rather than a single, general-purpose associative process

What are natural frequencies? Our Figure 1 shows the differences between natural and normalized frequencies. Natural frequencies leave the naturally occurring base rates intact, whereas normalized frequencies standardize these. Note, first, that all natural frequencies have a “nested set structure” in the sense that they simplify Bayesian computations, as defined in Equation 2. Hence, when B&S talk of “natural frequency formats that were not partitioned into nested set relations” (sect. 2.4, para. 2), these are not natural frequencies but instead normalized frequencies. This conceptual confusion makes the notion of nested sets appear as a different and broader explanation when it in fact simply paraphrases Equation 2. Second, natural frequencies refer to joint events, such as H&D events, as shown by the four numbers at the bottom of Figure 1–1. It is the structure of the entire tree that distinguishes natural from normalized frequencies. In contrast, an isolated frequency statement, represented by one single branch in the tree (such as 10 out of 1,000), could be part of a tree with natural frequencies, or normalized frequencies, or – if there is no second variable – no tree at all. Therefore, it is misleading to call the isolated statement “one of every 100 Americans will have been exposed to Flu strain X” (Table 5 of the target article) a natural frequency, as B&S do. In the same table caption, the relative frequency “33% of all Americans” is wrongly called a “single-event probability.” This incorrect use of terms causes B&S to draw erroneous conclusions, such as that “natural frequencies and single-event probabilities are rated similarly in their perceived clarity, understandability … [etc.]” (sect. 2.10).

Figure 1. Natural frequencies, chances, normalized frequencies, and conditional probabilities. Note that B&S's “chances” are exactly the same numbers as natural frequencies and lead to identical computational demands (see Eq. 2. Contrary to B&S's interpretation, chances are not mathematical probabilities, since these cannot be normalized over the interval [0,1] – otherwise, chances would no longer facilitate Bayesian computations. The fact that “chances” refer to a single event does not transform them into mathematical probabilities: not all statements about singular events are probabilities. Normalized frequencies are derived from natural frequencies by normalizing the base rate frequencies to some common number (here: 1,000), and conditional probabilities normalize to the interval [0,1]. Note that our distinction is neither that between frequencies versus probabilities nor that between natural frequencies versus single-event probabilities, as B&S suggest; we distinguish between natural frequencies which facilitate Bayesian computations and normalized frequencies and conditional probabilities which do not.

Next, the term “single-event probability” is irrelevant to our computational analysis (see Equations 1–3). A single-event probability can refer to at least three different concepts: a conditional probability p(D|H), which makes Bayesian computations difficult (Prediction 1 and Equation 1, a joint probability p(D&H), which makes Bayesian computations easier (Prediction 2 and Equation 3, and a simple single-event probability, such as a “30% chance of rain,” which has nothing to do with Bayesian inference but invites misunderstandings, because, by definition, no reference class is specified (Gigerenzer et al. Reference Gigerenzer, Hertwig, van den Broek, Fasolo and Katsikopoulos2005).

B&S's distinction between a “natural frequency algorithm,” “natural frequency heuristic,” and a “non-evolutionary natural frequency heuristic” is emphatically not ours. We cannot see how these would lead to different predictions, since in each case the algorithm computes Equation 2. We recommend not using the term heuristic for a version of Bayes's rule, since a heuristic, like a shortcut, ignores information. However, the term heuristics applies to shortcuts that approximate Bayes's rule under specific conditions such as rare events, where they lead to fast and frugal Bayesian reasoning (Table 1). Martignon et al. (Reference Martignon, Vitouch, Takezawa, Forster, Hardman and Macchi2003) analyzed the connection between natural frequency trees and fast and frugal trees.

B&S repeatedly refer to our evolutionary argument that natural sampling characterizes the way people learned individually in human history. But we did not – nor can one – use this general argument to derive Predictions 1 to 4 or the seven results reported in our 1995 article; these derivations were based solely on a computational analysis. The evolutionary perspective, however, provides a general framework for finding the right questions. Instead of asking what cognitive deficits explain reasoning that deviates from Bayes's rule (such as an error-prone System 1), the question should be how and why reasoning depends on the external representation of information. An ecological framework postulates that thought does not simply emerge inside the mind. Every theory of reasoning needs to specify both cognitive strategies and the environmental structures under which these strategies work well (just as with the shortcuts in Table 1).

The “nested sets” explanation originated from an oversight The authors credited by B&S as the originators of the “nested set theory” missed the distinction between natural and normalized frequencies, and implied that we had predicted that any kind of frequencies would facilitate reasoning. For instance, Johnson-Laird et al. (Reference Johnson-Laird, Legrenzi, Girotto, Legrenzi and Caverni1999, p. 81) stated: “In fact, data in the form of frequencies by no means guarantee good Bayesian reasoning,” and referred to a study reporting that normalized frequencies showed no facilitation. Since mental models theory cannot account for the facilitating effect of natural frequencies or “chances” (we discuss this further on), Johnson-Laird et al. introduced a “subset principle” identical to our 1995 Equation 2, without mentioning its source, and presented it as an alternative explanation to ours.

Macchi and Mosconi (Reference Macchi and Mosconi1998) seem to have been the first who confused natural frequencies with any kind of frequencies and concluded that the facilitating effect is not due to “frequentist phrasing” (which they mistook as our explanation) but to computational simplification (our explanation, which they proposed as their alternative one). Like Johnson-Laird et al., Macchi (Reference Macchi2000) independently rediscovered the proper explanation, and distinguished between “partitive” and “non-partitive” representations, where “partitive” – like the “subset principle” – is a new label for Equations 2 and 3. Lewis and Keren (Reference Lewis and Keren1999) promoted the same confusion. In Gigerenzer and Hoffrage (Reference Gigerenzer and Hoffrage1999), we pointed out that we had actually tested Prediction 4 about relative frequencies with 24 Bayesian problems in Experiment 2 of Gigerenzer and Hoffrage (Reference Gigerenzer and Hoffrage1995). Nevertheless, Evans et al. (Reference Evans, Handley, Perham, Over and Thompson2000) embraced the same misconception, concluding that “we are not convinced that it is frequency information per se which is responsible for the facilitation” (p. 200). All of these authors overlooked that our predictions were not about frequencies per se.

To summarize, the “nested set theory” originated from an oversight that reproduced itself like a meme through various articles. It is identical to our Equations 2 and 3, rephrasing the computational explanation we had proposed.

What is new about the “chances representation”? In our 1995 article, we tested two natural frequency representations, three relative frequency representations, and three probability representations. One of the probability representations had the structure of Equation 1, another the nested structure defined by Equation 3, and a third one demanded computations of in-between complexity (Equation 4 in our article). Therefore, B&S's contention that “nested sets” would be more general than our computational account – because it covers not only frequencies but probabilities as well – ignores that we actually applied the computational account to various probability representations. Specifically, B&S present a “chances representation,” which mimics the computational structure of natural frequencies precisely (see our Fig. 1), but is verbally phrased in terms of a single event. This representation is a new addition to the eight representations we already tested, and it leads to the same computational demands as in Equation 2. Hence, from our computational analysis, the prediction is that “chances” facilitate as well as natural frequencies because they involve exactly the same computations (although the occasionally odd-sounding wording may have a negative impact).

B&S call chances “single-event probabilities.” However, like natural frequencies, these are not probabilities. Mathematical probabilities have a range between 0 and 1. If chances were expressed in this range, their facilitating effect would be gone (like the conditional probabilities in Fig. 1). In the example B&S give, one cannot express the chances “12 out of 96” as “1 out of 8” or .125, because chances are exactly like natural frequencies in that they do not allow normalization. To summarize, “chances” are the same numbers as natural frequencies and lead to the same computational demands specified in Equation 2. The “nested sets” notion does not seem to add anything further.

What processes underlie non-Bayesian judgments? B&S's answer is: the associative “System 1.” Yet we have taken a closer look at non-Bayesian judgments and found that a substantial proportion of them follow several rules rather than one associative process. Specifically, 65% of all non-Bayesian judgments across children, laypeople, and experts resulted from applying a rule, and our Table 2 shows the six most frequent ones. These rules allow for a better understanding of non-Bayesian reasoning than does the notion of base-rate neglect due to “System 1.” In fact, one of these rules, base-rate only (conservatism), does not even entail base-rate neglect, but an over-reliance on the base rate. Moreover, a strategy such as the Fisherian one (or representativeness, which amounts to calculating p-values) ignores more than the base rate, namely, also p(D|-H). Ironically, when researchers use Fisher's null hypothesis tests to determine whether people follow Bayes's rule, they themselves use a non-Bayesian framework and commit base-rate neglect. Does this mean that researchers' “System 1” is in charge of hypothesis testing? In summary, there is experimental evidence that a substantial proportion of non-Bayesian judgments result from six rules; there is no reason to ignore these results and invoke some unknown general-purpose associative process instead.

What does the dual-processes notion explain?Table 2 indicates that a handful of rules model non-Bayesian judgments. In general, people rely on multiple cognitive rules or heuristics, consciously or unconsciously, tending to switch between these in an adaptive way. Models of these heuristics and the environments in which they work have been published (e.g., Gigerenzer Reference Gigerenzer, Koehler and Harvey2004; Payne et al. Reference Payne, Bettman and Johnson1993; Rieskamp & Otto Reference Rieskamp and Otto2006). What does a distinction between a “System 1” and “System 2” add?

Sloman (Reference Sloman1996a) proposed two systems of reasoning. Gigerenzer and Regier (Reference Gigerenzer and Regier1996) responded that there is a certain amount of slack in this distinction, that it collapses too many different dichotomies, and that it needs be sharpened by overt reference to explicit models of associative and rule-based processing. Sloman (Reference Sloman1996b) willingly admitted that he left room for further precision and clarity in his dual-processes notion. Yet more than ten years later, the notion is still vague. What is the mechanism of “System 1”: the delta rule, fuzzy set theory, fast and frugal heuristics, constrained neural networks, or something else? Since B&S assume a general-purpose process, there should be only one. And what is the nature of the rule-based system: first-order logic, Bayes's rule, signal-detection theory, or expected utility maximization? It cannot be all of these, since they are not the same. What do we gain from a dual-processes theory that does not develop a theory about the two processes?

Talking of two systems has become popular in some quarters. The “inside-outside view” is another case in point. According to Kahneman and Lovallo (Reference Kahneman and Lovallo1993, p. 25), an inside view focuses on “the case at hand,” whereas an outside view focuses “on the statistics of a class of cases.” Yet this distinction is too crude. For instance, it fails to predict the differential effect of natural versus normalized frequencies (Prediction 4), given that both invoke an “outside view,” as well as the differential effects of various single-event representations, such as in Prediction 2, which all invoke an “inside view.” B. F. Skinner asked us to refrain from building theories of cognition and to treat the mind as a black box. B&S's dual-systems notion is dangerously similar to two black boxes. What about replacing the two black boxes by an adaptive toolbox that contains multiple heuristics and logical tools?

Towards ecological rationality In their title, B&S include the term ecological rationality. We have introduced this term to refer to the study of how cognitive processes map onto environmental structures. The Bayesian algorithms and shortcuts are part of this larger enterprise. It extends to heuristics that solve problems ranging from categorization to choice to inference, and from catching fly balls to making coronary care unit allocations or moral judgments (Gigerenzer Reference Gigerenzer2007; Gigerenzer et al. 1999). The study of ecological rationality requires computational models of cognitive processes, in order to predict where they fail and succeed. It may actually help define the notion of dual processes more precisely.

References

Bar-Hillel, M. (1980) The base-rate fallacy in probability judgments. Acta Psychologica 44:211–33.CrossRefGoogle Scholar
Evans, J. St. B. T., Handley, S. J., Perham, N., Over, D. E. & Thompson, V. A. (2000) Frequency versus probability formats in statistical word problems. Cognition 77:197213.CrossRefGoogle ScholarPubMed
Gigerenzer, G. (2002) Calculated risks: How to know when numbers deceive you. Simon, & Schuster, . (UK version: Reckoning with risk: Learning to live with uncertainty. Penguin).Google Scholar
Gigerenzer, G. (2004) Fast and frugal heuristics: The tools of bounded rationality. In: Blackwell handbook of judgment and decision making, ed. Koehler, D. & Harvey, N., pp. 6288. Blackwell.CrossRefGoogle Scholar
Gigerenzer, G. (2007) Gut feelings: The intelligence of the unconscious. Viking Press.Google Scholar
Gigerenzer, G., Hertwig, R., van den Broek, E., Fasolo, B. & Katsikopoulos, K. (2005) “A 30% chance of rain tomorrow”: How does the public understand probabilistic weather forecasts? Risk Analysis 25:623–29.CrossRefGoogle ScholarPubMed
Gigerenzer, G. & Hoffrage, U. (1995) How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review 102:684704.CrossRefGoogle Scholar
Gigerenzer, G. & Hoffrage, U. (1999) Overcoming difficulties in Bayesian reasoning: A reply to Lewis and Keren (1999) and Mellers and McGraw (1999). Psychological Review 106:425–30.CrossRefGoogle Scholar
Gigerenzer, G. & Regier, T. P. (1996) How do we tell an association from a rule? Psychological Bulletin 119:2326.CrossRefGoogle Scholar
Hoffrage, U. & Gigerenzer, G. (1998) Using natural frequencies to improve diagnostic inferences. Academic Medicine 73:538–40.CrossRefGoogle ScholarPubMed
Hoffrage, U., Lindsey, S., Hertwig, R. & Gigerenzer, G. (2000) Communicating statistical information. Science 290:2261–62.CrossRefGoogle ScholarPubMed
Johnson-Laird, P. N., Legrenzi, P., Girotto, V., Legrenzi, M. S. & Caverni, J.-P. (1999) Naïve probability: A mental model theory of extensional reasoning. Psychological Review 106:6288.CrossRefGoogle ScholarPubMed
Kahneman, D. & Lovallo, D. (1993) Timid theories and bold forecasts: A cognitive perspective on risk taking. Management Science 39:1731.CrossRefGoogle Scholar
Kahneman, D. & Tversky, A. (1972) Subjective probability: A judgment of representativeness. Cognitive Psychology 3:430–54.CrossRefGoogle Scholar
Lewis, C. & Keren, G. (1999) On the difficulties underlying Bayesian reasoning: Comment on Gigerenzer and Hoffrage. Psychological Review 106:411–16.CrossRefGoogle Scholar
Macchi, L. (2000) Partitive formulation of information in probabilistic problems: Beyond heuristics and frequency format explanations. Organizational Behavior and Human Decision Processes 82:217–36.CrossRefGoogle ScholarPubMed
Macchi, L. & Mosconi, G. (1998) Computational features vs frequentist phrasing in the base-rate fallacy. Swiss Journal of Psychology 57:7985.Google Scholar
Martignon, L., Vitouch, O., Takezawa, M. & Forster, M. R. (2003) Naive and yet enlightened: From natural frequencies to fast and frugal decision trees. In: Thinking: Psychological perspectives on reasoning, judgment and decision making, ed. Hardman, D. & Macchi, L., pp. 189211. Wiley.CrossRefGoogle Scholar
Payne, J. W., Bettman, J. R. & Johnson, E. J. (1993) The adaptive decision maker. Cambridge University Press.CrossRefGoogle Scholar
Rieskamp, J. & Otto, P. E. (2006) SSL: A theory of how people learn to select strategies. Journal of Experimental Psychology: General 135:207–36.CrossRefGoogle ScholarPubMed
Sloman, S. A. (1996a) The empirical case for two systems of reasoning. Psychological Bulletin 119:322.CrossRefGoogle Scholar
Sloman, S. A. (1996b) The probative value of simultaneous contradictory belief: Reply to Gigerenzer and Regier (1996). Psychological Bulletin 119:2730.CrossRefGoogle Scholar
Zhu, L. & Gigerenzer, G. (2006) Are children intuitive Bayesians? Cognition 77:197213.Google Scholar
Figure 0

Table 1. Bayesian strategies and cognitive shortcuts for approximating Bayes' rule. Based on the experimental evidence in Gigerenzer and Hoffrage (1995, pp. 689–691). n(D&H) is the natural frequency of D&H cases. We suggest that Barbey & Sloman consider these rules as mechanisms for their System 2, to be interpreted as an adaptive toolbox rather than a single, general-purpose calculus

Figure 1

Table 2. Six cognitive rules underlying non-Bayesian judgments. Values are percentages of people classified as using a rule among all non-Bayesian judgments. The experiments with children (grades 4, 5, and 6) were conducted by Zhu & Gigerenzer (2006), with laypeople (students with median age 21–22) by Gigerenzer & Hoffrage (1995), with medical students (median age 25) by Hoffrage et al. (2000), and with physicians by Hoffrage & Gigerenzer (1998). Cognitive rules are reported here for natural frequencies and conditional probabilities (standard format) only; for other representations and how rules depend on representations, see the original studies. We suggest that Barbey & Sloman consider these rules as mechanisms for their System 1, to be interrupted as an adaptive toolbox rather than a single, general-purpose associative process

Figure 2

Figure 1. Natural frequencies, chances, normalized frequencies, and conditional probabilities. Note that B&S's “chances” are exactly the same numbers as natural frequencies and lead to identical computational demands (see Eq. 2. Contrary to B&S's interpretation, chances are not mathematical probabilities, since these cannot be normalized over the interval [0,1] – otherwise, chances would no longer facilitate Bayesian computations. The fact that “chances” refer to a single event does not transform them into mathematical probabilities: not all statements about singular events are probabilities. Normalized frequencies are derived from natural frequencies by normalizing the base rate frequencies to some common number (here: 1,000), and conditional probabilities normalize to the interval [0,1]. Note that our distinction is neither that between frequencies versus probabilities nor that between natural frequencies versus single-event probabilities, as B&S suggest; we distinguish between natural frequencies which facilitate Bayesian computations and normalized frequencies and conditional probabilities which do not.