Anderson's target article adds to a growing literature (e.g., Mesulam Reference Mesulam1990; Prinz Reference Prinz and Stainton2006; Uttal Reference Uttal2001) that criticizes the recurring tendency to partition the brain into localized modules (e.g., Carruthers Reference Carruthers2006; Tooby & Cosmides Reference Tooby, Cosmides, Barkow, Cosmides and Tooby1992). Ironically, Anderson's critique of modularity is steeped in modularist terms such as redeployment. We are sympathetic with the general thrust of Anderson's theory and find it very compatible with the Leabra tripartite architecture (O'Reilly Reference O'Reilly1998; O'Reilly & Munakata Reference O'Reilly and Munakata2000). It seems that much of the controversy can be traced back to terminological confusion and false dichotomies. Our goal in this commentary is to dispel some of the confusion and clarify Leabra's position on modularity.
The target article is vague about the key term function. In his earlier work, Anderson follows Fodor (Reference Fodor2000) in “the pragmatic definition of a (cognitive) function as whatever appears in one of the boxes in a psychologist's diagram of cognitive processing” (Anderson Reference Anderson2007c, p. 144). Although convenient for a meta-review of 1,469 fMRI experiments (Anderson Reference Anderson2007a; Reference Anderson2007c), this definition contributes little to terminological clarity. In particular, when we (Atallah et al. Reference Atallah, Frank and O'Reilly2004, p. 253) wrote that “different brain areas clearly have some degree of specialized function,” we did not mean cognitive functions such as face recognition. What we meant is closest to what Anderson calls “cortical biases” or, following Bergeron (Reference Bergeron2007), “working.”
Specifically, the posterior cortex in Leabra specializes in slow interleaved learning that tends to develop overlapping distributed representations, which in turn promote similarity-based generalization. This computational capability can be used in a myriad of cognitive functions (O'Reilly & Munakata Reference O'Reilly and Munakata2000). The hippocampus and the surrounding structures in the medial temporal lobe (MTL) specialize in rapid learning of sparse conjunctive representations that minimize interference (e.g., McClelland et al. Reference McClelland, McNaughton and O'Reilly1995). The prefrontal cortex (PFC) specializes in sustained neural firing (e.g., Miller & Cohen Reference Miller and Cohen2001; O'Reilly Reference O'Reilly2006) and relies on dynamic gating from the basal ganglia (BG) to satisfy the conflicting demands of rapid updating of (relevant) information, on one hand, and robust maintenance in the face of new (and distracting) information, on the other (e.g., Atallah et al. Reference Atallah, Frank and O'Reilly2004; O'Reilly & Frank Reference O'Reilly and Frank2006). Importantly, mostFootnote
1
of this specialization arises from parametric variation of the same underlying substrate. The components of the Leabra architecture differ in their learning rates, the amount of lateral inhibition, and so on, but not in the nature of their processing units. Also, they are in constant, intensive interaction. Each high-level task engages all three components (O'Reilly et al. Reference O'Reilly, Braver, Cohen, Miyake and Shah1999; O'Reilly & Munakata Reference O'Reilly and Munakata2000).
We now turn to the question of modularity. Here the terminology is relatively clear (e.g., Carruthers Reference Carruthers2006; Fodor Reference Fodor1983; Reference Fodor2000; Prinz Reference Prinz and Stainton2006; Samuels Reference Samuels and Stainton2006). Fodor's (Reference Fodor1983) foundational book identified nine criteria for modularity. We have space to discuss only domain specificity and encapsulation. These two are widely regarded as most central (Fodor Reference Fodor2000; Samuels Reference Samuels and Stainton2006).
A system is domain-specific (as opposed to domain-general) when it only receives inputs concerning a certain subject matter. All three Leabra components are domain-general in this sense. Both MTL and PFC/BG receive convergent inputs from multiple and variegated brain areas. The posterior cortex is an interactive multitude of cortical areas whose specificity is a matter of degree and varies considerably.
The central claim of Anderson's massive redeployment hypothesis (MRH) is that most brain areas are much closer to the general than the specific end of the spectrum. This claim is hardly original, but it is worth repeating because the subtractive fMRI methodology tends to obscure it (Uttal Reference Uttal2001). fMRI is a wonderful tool, but it should be interpreted with care (Poldrack Reference Poldrack2006). Any stimulus provokes a large response throughout the brain, and a typical fMRI study reports tiny differencesFootnote
2
between conditions – typically less than 1% (Huettel et al. Reference Huettel, Song and McCarthy2008). The importance of Anderson's (2007a; 2007c) meta-analyses is that, even if we grant the (generous) assumption that fMRI can reliably index specificity, one still finds widespread evidence for generality.
MRH also predicts a correlation between the degree of generality and phylogenetic age. We are skeptical of the use of the posterior-anterior axis as a proxy for age because it is confounded with many other factors. Also, the emphasis on age encourages terms such as reuse, redeployment, and recycling, that misleadingly suggest that each area was deployed for one primordial and specific function in the evolutionary past and was later redeployed for additional functions. Such inferences must be based on comparative data from multiple species. As the target article is confined to human fMRI, the situation is quite different. Given a fixed evolutionary endowment and relatively stable environment, each human child develops and/or learns many cognitive functions simultaneously. This seems to leave no room for redeployment but only for deployment for multiple uses.
Anderson's critique of modularity neglects one of its central features – information encapsulation. We wonder what predictions MRH makes about this important issue. A system is encapsulated when it exchangesFootnote
3
relatively little information with other systems. Again, this is a matter of degree, as our Figure 1 illustrates. The degree of encapsulation depends on factors such as the number of exposed (input/output) units relative to the total number of units in the cluster, and the density and strength of distal connections relative to local ones. Even when all units are exposed (as cluster D illustrates), the connections to and from each individual unit are still predominantly local because the units share the burden of distal communication. Long-range connections are a limited resource (Cherniak et al. Reference Cherniak, Mokhtarzada, Rodrigues-Esteban and Changizi2004) but are critical for integrating the components into a coherent whole. The Leabra components are in constant, high-bandwidth interaction, and parallel constraint satisfaction among them is a fundamental implicit processing mechanism. Hence, we eschew the terms module and encapsulation in our theorizing. This is a source of creative tension in our (Jilk et al. Reference Jilk, Lebiere, O'Reilly and Anderson2008) collaboration to integrate Leabra with the ACT-R architecture, whose proponents make the opposite emphasis (J. R. Anderson Reference Anderson2007; J. R. Anderson et al. Reference Anderson, Bothell, Byrne, Douglass, Lebiere and Qin2004). Much of this tension is defused by the realization that the modularist terminology forces a binary distinction on what is fundamentally a continuum.
Figure 1. Information encapsulation is a matter of degree. Four neuronal clusters are shown, of which A is the most and D the least encapsulated. Black circles depict exposed (input/output) units that make distal connections to other cluster(s); grey circles depict hidden units that make local connections only.
Anderson's target article adds to a growing literature (e.g., Mesulam Reference Mesulam1990; Prinz Reference Prinz and Stainton2006; Uttal Reference Uttal2001) that criticizes the recurring tendency to partition the brain into localized modules (e.g., Carruthers Reference Carruthers2006; Tooby & Cosmides Reference Tooby, Cosmides, Barkow, Cosmides and Tooby1992). Ironically, Anderson's critique of modularity is steeped in modularist terms such as redeployment. We are sympathetic with the general thrust of Anderson's theory and find it very compatible with the Leabra tripartite architecture (O'Reilly Reference O'Reilly1998; O'Reilly & Munakata Reference O'Reilly and Munakata2000). It seems that much of the controversy can be traced back to terminological confusion and false dichotomies. Our goal in this commentary is to dispel some of the confusion and clarify Leabra's position on modularity.
The target article is vague about the key term function. In his earlier work, Anderson follows Fodor (Reference Fodor2000) in “the pragmatic definition of a (cognitive) function as whatever appears in one of the boxes in a psychologist's diagram of cognitive processing” (Anderson Reference Anderson2007c, p. 144). Although convenient for a meta-review of 1,469 fMRI experiments (Anderson Reference Anderson2007a; Reference Anderson2007c), this definition contributes little to terminological clarity. In particular, when we (Atallah et al. Reference Atallah, Frank and O'Reilly2004, p. 253) wrote that “different brain areas clearly have some degree of specialized function,” we did not mean cognitive functions such as face recognition. What we meant is closest to what Anderson calls “cortical biases” or, following Bergeron (Reference Bergeron2007), “working.”
Specifically, the posterior cortex in Leabra specializes in slow interleaved learning that tends to develop overlapping distributed representations, which in turn promote similarity-based generalization. This computational capability can be used in a myriad of cognitive functions (O'Reilly & Munakata Reference O'Reilly and Munakata2000). The hippocampus and the surrounding structures in the medial temporal lobe (MTL) specialize in rapid learning of sparse conjunctive representations that minimize interference (e.g., McClelland et al. Reference McClelland, McNaughton and O'Reilly1995). The prefrontal cortex (PFC) specializes in sustained neural firing (e.g., Miller & Cohen Reference Miller and Cohen2001; O'Reilly Reference O'Reilly2006) and relies on dynamic gating from the basal ganglia (BG) to satisfy the conflicting demands of rapid updating of (relevant) information, on one hand, and robust maintenance in the face of new (and distracting) information, on the other (e.g., Atallah et al. Reference Atallah, Frank and O'Reilly2004; O'Reilly & Frank Reference O'Reilly and Frank2006). Importantly, mostFootnote 1 of this specialization arises from parametric variation of the same underlying substrate. The components of the Leabra architecture differ in their learning rates, the amount of lateral inhibition, and so on, but not in the nature of their processing units. Also, they are in constant, intensive interaction. Each high-level task engages all three components (O'Reilly et al. Reference O'Reilly, Braver, Cohen, Miyake and Shah1999; O'Reilly & Munakata Reference O'Reilly and Munakata2000).
We now turn to the question of modularity. Here the terminology is relatively clear (e.g., Carruthers Reference Carruthers2006; Fodor Reference Fodor1983; Reference Fodor2000; Prinz Reference Prinz and Stainton2006; Samuels Reference Samuels and Stainton2006). Fodor's (Reference Fodor1983) foundational book identified nine criteria for modularity. We have space to discuss only domain specificity and encapsulation. These two are widely regarded as most central (Fodor Reference Fodor2000; Samuels Reference Samuels and Stainton2006).
A system is domain-specific (as opposed to domain-general) when it only receives inputs concerning a certain subject matter. All three Leabra components are domain-general in this sense. Both MTL and PFC/BG receive convergent inputs from multiple and variegated brain areas. The posterior cortex is an interactive multitude of cortical areas whose specificity is a matter of degree and varies considerably.
The central claim of Anderson's massive redeployment hypothesis (MRH) is that most brain areas are much closer to the general than the specific end of the spectrum. This claim is hardly original, but it is worth repeating because the subtractive fMRI methodology tends to obscure it (Uttal Reference Uttal2001). fMRI is a wonderful tool, but it should be interpreted with care (Poldrack Reference Poldrack2006). Any stimulus provokes a large response throughout the brain, and a typical fMRI study reports tiny differencesFootnote 2 between conditions – typically less than 1% (Huettel et al. Reference Huettel, Song and McCarthy2008). The importance of Anderson's (2007a; 2007c) meta-analyses is that, even if we grant the (generous) assumption that fMRI can reliably index specificity, one still finds widespread evidence for generality.
MRH also predicts a correlation between the degree of generality and phylogenetic age. We are skeptical of the use of the posterior-anterior axis as a proxy for age because it is confounded with many other factors. Also, the emphasis on age encourages terms such as reuse, redeployment, and recycling, that misleadingly suggest that each area was deployed for one primordial and specific function in the evolutionary past and was later redeployed for additional functions. Such inferences must be based on comparative data from multiple species. As the target article is confined to human fMRI, the situation is quite different. Given a fixed evolutionary endowment and relatively stable environment, each human child develops and/or learns many cognitive functions simultaneously. This seems to leave no room for redeployment but only for deployment for multiple uses.
Anderson's critique of modularity neglects one of its central features – information encapsulation. We wonder what predictions MRH makes about this important issue. A system is encapsulated when it exchangesFootnote 3 relatively little information with other systems. Again, this is a matter of degree, as our Figure 1 illustrates. The degree of encapsulation depends on factors such as the number of exposed (input/output) units relative to the total number of units in the cluster, and the density and strength of distal connections relative to local ones. Even when all units are exposed (as cluster D illustrates), the connections to and from each individual unit are still predominantly local because the units share the burden of distal communication. Long-range connections are a limited resource (Cherniak et al. Reference Cherniak, Mokhtarzada, Rodrigues-Esteban and Changizi2004) but are critical for integrating the components into a coherent whole. The Leabra components are in constant, high-bandwidth interaction, and parallel constraint satisfaction among them is a fundamental implicit processing mechanism. Hence, we eschew the terms module and encapsulation in our theorizing. This is a source of creative tension in our (Jilk et al. Reference Jilk, Lebiere, O'Reilly and Anderson2008) collaboration to integrate Leabra with the ACT-R architecture, whose proponents make the opposite emphasis (J. R. Anderson Reference Anderson2007; J. R. Anderson et al. Reference Anderson, Bothell, Byrne, Douglass, Lebiere and Qin2004). Much of this tension is defused by the realization that the modularist terminology forces a binary distinction on what is fundamentally a continuum.
Figure 1. Information encapsulation is a matter of degree. Four neuronal clusters are shown, of which A is the most and D the least encapsulated. Black circles depict exposed (input/output) units that make distal connections to other cluster(s); grey circles depict hidden units that make local connections only.