Jones & Love (J&L) raise the specter of Bayesian Fundamentalism sweeping through cognitive science, isolating it from algorithmic models and neuroscience, ushering in a Dark Ages dominated by an unholy marriage of radical behaviorism with evolutionary “just so” stories. While we agree that a critical assessment of the Bayesian framework for cognition could be salutary, the target article suffers from a serious imbalance: long on speculation grounded in murky metaphors, short on discussion of actual applications of the Bayesian framework to modeling of cognitive processes. Our commentary aims to redress that imbalance.
The target article virtually ignores the topic of causal inference (citing only Griffiths & Tenenbaum Reference Griffiths and Tenenbaum2009). This omission is odd, as causal inference is both a core cognitive process and one of the most prominent research areas in which modern Bayesian models have been applied. To quote a recent article by Holyoak and Cheng in Annual Review of Psychology, “The most important methodological advance in the past decade in psychological work on causal learning has been the introduction of Bayesian inference to causal inference. This began with the work of Griffiths & Tenenbaum (Reference Griffiths and Tenenbaum2005, Reference Griffiths and Tenenbaum2009; Tenenbaum & Griffiths Reference Tenenbaum, Griffiths, Leen, Dietterich and Tresp2001; see also Waldmann & Martignon Reference Waldmann, Martignon, Gernsbacher and Derry1998)” (Holyoak & Cheng Reference Holyoak and Cheng2011, pp. 142–43). Here we recap how and why the Bayesian framework has had its impact.
Earlier, Pearl's (Reference Pearl1988) concept of “causal Bayes nets” had inspired the hypothesis that people learn causal models (Waldmann & Holyoak Reference Waldmann and Holyoak1992), and it had been argued that causal induction is fundamentally rational (the power PC [probabilistic contrast] theory of Cheng Reference Cheng1997). However, for about a quarter century, the view that people infer cause-effect relations from non-causal contingency data in a fundamentally rational fashion was pitted against a host of alternatives based either on heuristics and biases (e.g., Schustack & Sternberg Reference Schustack and Sternberg1981) or on associative learning models, most notably Rescorla and Wagner's (Reference Rescorla, Wagner, Black and Prokasy1972) learning rule (e.g., Shanks & Dickinson Reference Shanks, Dickinson and Bower1987). A decisive resolution of this debate proved to be elusive in part because none of the competing models provided a principled account of how uncertainty influences human causal judgments (Cheng & Holyoak Reference Cheng, Holyoak, Roitblat and Meyer1995).
J&L assert that, “Taken as a psychological theory, the Bayesian framework does not have much to say” (sect. 2.2, para. 3). In fact, the Bayesian framework says that the assessment of causal strength should not be based simply on a point estimate, as had previously been assumed, but on a probability distribution that explicitly quantifies the uncertainty associated with the estimate. It also says that causal judgments should depend jointly on prior knowledge and the likelihoods of the observed data. Griffiths and Tenenbaum (Reference Griffiths and Tenenbaum2005) made the critical contribution of showing that different likelihood functions are derived from the different assumptions about cause-effect representations postulated by the power PC theory versus associative learning theory. Both theories can be formulated within a common Bayesian framework, with each being granted exactly the same basis for representing uncertainty about causal strength. Hence, a comparison of these two Bayesian models can help identify the fundamental representations underlying human causal inference.
A persistent complaint that J&L direct at Bayesian modeling is that, “Comparing multiple Bayesian models of the same task is rare” (target article, Abstract); “[i]t is extremely rare to find a comparison among alternative Bayesian models of the same task to determine which is most consistent with empirical data” (sect. 1, para. 6). One of J&L's concluding admonishments is that, “there are generally many Bayesian models of any task. . . . Comparison among alternative models would potentially reveal a great deal” (sect. 7, para. 2). But as the work of Griffiths and Tenenbaum (Reference Griffiths and Tenenbaum2005) exemplifies, a basis for comparison of multiple models is exactly what the Bayesian framework provided to the field of causal learning.
Lu et al. (Reference Lu, Yuille, Liljeholm, Cheng and Holyoak2008b) carried the project a step further, implementing and testing a 2×2 design of Bayesian models of learning causal strength: the two likelihood functions crossed with two priors (uninformative vs. a preference for sparse and strong causes). When compared to human data, model comparisons established that human causal learning is better explained by the assumptions underlying the power PC theory, rather than by those underlying associative models. The sparse-and-strong prior accounted for subtle interactions involving generative and preventive causes that could not be explained by uninformative priors.
J&L acknowledge that, “An important argument in favor of rational over mechanistic modeling is that the proliferation of mechanistic modeling approaches over the past several decades has led to a state of disorganization” (sect. 4.1, para. 2). Perhaps no field better exemplified this state of affairs than causal learning, which had produced roughly 40 algorithmic models by a recent count (Hattori & Oaksford Reference Hattori and Oaksford2007). Almost all of these are non-normative, defined (following Perales & Shanks Reference Perales and Shanks2007) as not derived from a well-specified computational analysis of the goals of causal learning. Lu et al. (Reference Lu, Yuille, Liljeholm, Cheng and Holyoak2008b) compared their Bayesian models to those which Perales and Shanks had tested in a large meta-analysis. The Bayesian extensions of the power PC theory (with zero or one parameter) accounted for up to 92% of the variance, performing at least as well as the most successful non-normative model (with four free parameters), and much better than the Rescorla-Wagner model (see also Griffiths & Tenenbaum Reference Griffiths and Tenenbaum2009).
New Bayesian models of causal learning have thus built upon and significantly extended previous proposals (e.g., the power PC theory), and have in turn been extended to completely new areas. For example, the Bayesian power PC theory has been applied to analogical inferences based on a single example (Holyoak et al. Reference Holyoak, Lee and Lu2010). Rather than blindly applying some single privileged Bayesian theory, alternative models have been systematically formulated and compared to human data. Rather than preempting algorithmic models, the advances in Bayesian modeling have inspired new algorithmic models of sequential causal learning, addressing phenomena related to learning curves and trial order (Daw et al. 2007; Kruschke Reference Kruschke2006; Lu et al. Reference Lu, Rojas, Beckers, Yuille, Love, McRae and Sloutsky2008a). Efforts are under way to link computation-level theory with algorithmic and neuroscientific models. In short, rather than monolithic Bayesian Fundamentalism, normal science holds sway. Perhaps J&L will happily (if belatedly) acknowledge the past decade of work on causal learning as a shining example of “Bayesian Enlightenment.”
Jones & Love (J&L) raise the specter of Bayesian Fundamentalism sweeping through cognitive science, isolating it from algorithmic models and neuroscience, ushering in a Dark Ages dominated by an unholy marriage of radical behaviorism with evolutionary “just so” stories. While we agree that a critical assessment of the Bayesian framework for cognition could be salutary, the target article suffers from a serious imbalance: long on speculation grounded in murky metaphors, short on discussion of actual applications of the Bayesian framework to modeling of cognitive processes. Our commentary aims to redress that imbalance.
The target article virtually ignores the topic of causal inference (citing only Griffiths & Tenenbaum Reference Griffiths and Tenenbaum2009). This omission is odd, as causal inference is both a core cognitive process and one of the most prominent research areas in which modern Bayesian models have been applied. To quote a recent article by Holyoak and Cheng in Annual Review of Psychology, “The most important methodological advance in the past decade in psychological work on causal learning has been the introduction of Bayesian inference to causal inference. This began with the work of Griffiths & Tenenbaum (Reference Griffiths and Tenenbaum2005, Reference Griffiths and Tenenbaum2009; Tenenbaum & Griffiths Reference Tenenbaum, Griffiths, Leen, Dietterich and Tresp2001; see also Waldmann & Martignon Reference Waldmann, Martignon, Gernsbacher and Derry1998)” (Holyoak & Cheng Reference Holyoak and Cheng2011, pp. 142–43). Here we recap how and why the Bayesian framework has had its impact.
Earlier, Pearl's (Reference Pearl1988) concept of “causal Bayes nets” had inspired the hypothesis that people learn causal models (Waldmann & Holyoak Reference Waldmann and Holyoak1992), and it had been argued that causal induction is fundamentally rational (the power PC [probabilistic contrast] theory of Cheng Reference Cheng1997). However, for about a quarter century, the view that people infer cause-effect relations from non-causal contingency data in a fundamentally rational fashion was pitted against a host of alternatives based either on heuristics and biases (e.g., Schustack & Sternberg Reference Schustack and Sternberg1981) or on associative learning models, most notably Rescorla and Wagner's (Reference Rescorla, Wagner, Black and Prokasy1972) learning rule (e.g., Shanks & Dickinson Reference Shanks, Dickinson and Bower1987). A decisive resolution of this debate proved to be elusive in part because none of the competing models provided a principled account of how uncertainty influences human causal judgments (Cheng & Holyoak Reference Cheng, Holyoak, Roitblat and Meyer1995).
J&L assert that, “Taken as a psychological theory, the Bayesian framework does not have much to say” (sect. 2.2, para. 3). In fact, the Bayesian framework says that the assessment of causal strength should not be based simply on a point estimate, as had previously been assumed, but on a probability distribution that explicitly quantifies the uncertainty associated with the estimate. It also says that causal judgments should depend jointly on prior knowledge and the likelihoods of the observed data. Griffiths and Tenenbaum (Reference Griffiths and Tenenbaum2005) made the critical contribution of showing that different likelihood functions are derived from the different assumptions about cause-effect representations postulated by the power PC theory versus associative learning theory. Both theories can be formulated within a common Bayesian framework, with each being granted exactly the same basis for representing uncertainty about causal strength. Hence, a comparison of these two Bayesian models can help identify the fundamental representations underlying human causal inference.
A persistent complaint that J&L direct at Bayesian modeling is that, “Comparing multiple Bayesian models of the same task is rare” (target article, Abstract); “[i]t is extremely rare to find a comparison among alternative Bayesian models of the same task to determine which is most consistent with empirical data” (sect. 1, para. 6). One of J&L's concluding admonishments is that, “there are generally many Bayesian models of any task. . . . Comparison among alternative models would potentially reveal a great deal” (sect. 7, para. 2). But as the work of Griffiths and Tenenbaum (Reference Griffiths and Tenenbaum2005) exemplifies, a basis for comparison of multiple models is exactly what the Bayesian framework provided to the field of causal learning.
Lu et al. (Reference Lu, Yuille, Liljeholm, Cheng and Holyoak2008b) carried the project a step further, implementing and testing a 2×2 design of Bayesian models of learning causal strength: the two likelihood functions crossed with two priors (uninformative vs. a preference for sparse and strong causes). When compared to human data, model comparisons established that human causal learning is better explained by the assumptions underlying the power PC theory, rather than by those underlying associative models. The sparse-and-strong prior accounted for subtle interactions involving generative and preventive causes that could not be explained by uninformative priors.
J&L acknowledge that, “An important argument in favor of rational over mechanistic modeling is that the proliferation of mechanistic modeling approaches over the past several decades has led to a state of disorganization” (sect. 4.1, para. 2). Perhaps no field better exemplified this state of affairs than causal learning, which had produced roughly 40 algorithmic models by a recent count (Hattori & Oaksford Reference Hattori and Oaksford2007). Almost all of these are non-normative, defined (following Perales & Shanks Reference Perales and Shanks2007) as not derived from a well-specified computational analysis of the goals of causal learning. Lu et al. (Reference Lu, Yuille, Liljeholm, Cheng and Holyoak2008b) compared their Bayesian models to those which Perales and Shanks had tested in a large meta-analysis. The Bayesian extensions of the power PC theory (with zero or one parameter) accounted for up to 92% of the variance, performing at least as well as the most successful non-normative model (with four free parameters), and much better than the Rescorla-Wagner model (see also Griffiths & Tenenbaum Reference Griffiths and Tenenbaum2009).
New Bayesian models of causal learning have thus built upon and significantly extended previous proposals (e.g., the power PC theory), and have in turn been extended to completely new areas. For example, the Bayesian power PC theory has been applied to analogical inferences based on a single example (Holyoak et al. Reference Holyoak, Lee and Lu2010). Rather than blindly applying some single privileged Bayesian theory, alternative models have been systematically formulated and compared to human data. Rather than preempting algorithmic models, the advances in Bayesian modeling have inspired new algorithmic models of sequential causal learning, addressing phenomena related to learning curves and trial order (Daw et al. 2007; Kruschke Reference Kruschke2006; Lu et al. Reference Lu, Rojas, Beckers, Yuille, Love, McRae and Sloutsky2008a). Efforts are under way to link computation-level theory with algorithmic and neuroscientific models. In short, rather than monolithic Bayesian Fundamentalism, normal science holds sway. Perhaps J&L will happily (if belatedly) acknowledge the past decade of work on causal learning as a shining example of “Bayesian Enlightenment.”