Much in Clark's review is of fundamental importance. Probabilistic inference is crucial to life in general and neural systems in particular, but does it have a single coherent logic? Jaynes (Reference Jaynes2003) argued that it does, but for that logic to be relevant to brain theory, it must be shown how systems built from local neural processors can perform essential functions that are assumed to be the responsibility of the scientist in Jaynes' theory (Fiorillo Reference Fiorillo2012; Phillips Reference Phillips2012).
Most crucial of those functions are selection of the information relevant to the role of each local cell or microcircuit and coordination of their multiple concurrent activities. The information available to neural systems is so rich that it cannot be used for inference if taken as a single, multi-dimensional whole because the number of locations in multi-dimensional space increases exponentially with dimensionality. Most events that actually occur in high-dimensional spaces are therefore novel and distant from previous events, precluding learning based on sample probabilities. This constraint, well-known to the machine-learning community as the curse-of-dimensionality, has major consequences for psychology and neuroscience. It implies that for learning and inference to be possible large data-bases must be divided into small subsets, as amply confirmed by the clear selectivity observed within and between brain regions at all hierarchical levels. Creation of the subsets involves both prespecified mechanisms, as in receptive field selectivity, and dynamic grouping as proposed by Gestalt psychology (Phillips et al. Reference Phillips, von der Malsburg, Singer, von der Malsburg, Phillips and Singer2010). The criteria for selection must be use-dependent because information crucial to one use would be fatal to another, as in the contrast between dorsal and ventral visual pathways. Contextual modulation is also crucial because interpretations with low probability overall may have high probability in certain contexts. Therefore, the activity of local processors must be guided by the broader context, and their multiple concurrent decisions must be coordinated if they are to create coherent percepts, thoughts, and actions.
Most models of predictive coding (PC) and Bayesian inference (BI) assume that the information to be coded and used for inference is a given. In those models, it is – by the modelers. Modelers may assume that in the real world this information is given by the external input, but that provides more information than could be used for inference if taken as a whole. Self-organized selection of the information relevant to particular uses is therefore crucial. Efficient coding strategies, such as PC, are concerned with ways of transmitting information through a hierarchy, not with deciding what information to transmit. They assume lossless transmission of all input information to be the goal, and so provide no way of extracting different information for different uses. Models using BI show how to combine information from different sources when computing a single posterior decision; but they do not show how local neural processors can select the relevant information, nor do they show how multiple streams of processing can coordinate their activities. Thus, local selectivity, dynamic-grouping, contextual-disambiguation, and coordinating interactions are all necessary within cognitive systems, but are not adequately explained by the essential principles of either PC or BI.
Clark's review, however, does contain the essence of an idea that could help resolve the mysteries of selectivity and coordination, that is, context-sensitive gain-control, for which there are several widely-distributed neural mechanisms. A crucial strength of the free-energy theory is that it uses gain-controlling interactions to implement attention (Feldman & Friston Reference Feldman and Friston2010), but such mechanisms can do far more than that. For example, they can select and coordinate activities by amplifying or suppressing them as a function of their predictive relationships and current relevance. This is emphasized by the theory of Coherent Infomax (Kay et al. Reference Kay, Floreano and Phillips1998; Kay & Phillips Reference Kay and Phillips2010; Phillips et al. Reference Phillips, Kay and Smyth1995), which synthesizes evidence from neuroanatomy, neurophysiology, macroscopic neuroimaging, and psychophysics (Phillips & Singer Reference Phillips and Singer1997; von der Malsburg et al. Reference von der Malsburg, Phillips and Singer2010). That theory is further strengthened by evidence from psychopathology as reviewed by Phillips and Silverstein (Reference Phillips and Silverstein2003), and extended by many subsequent studies. Körding and König (Reference Körding and König2000) argue for a closely related theory.
Free-energy theory (Friston Reference Friston2010) and Coherent Infomax assume that good predictions are vital, and formalize that assumption as an information theoretic objective. Though these theories have superficial differences, with Coherent Infomax being formulated at the neuronal rather than the system level, it may be possible to unify their objectives as that of maximizing prediction success, which, under plausible assumptions, is equivalent to minimizing prediction error (Phillips & Friston, in preparation). Formulating the objective as maximizing the amount of information correctly predicted directly solves the “dark-room” problem discussed by Clark. That objective, however, does not necessarily imply that prediction errors are the fundamental currency of feedforward communication. Inferences could be computed by reducing prediction errors locally, and communicating inferences more widely (Spratling Reference Spratling2008a). That version of PC is supported by much neurobiological evidence, though it remains possible that neural systems use both versions.
Another important issue concerns the obvious diversity of brains and cognition. How could any unifying theory cast light on that? Though possible in principle, detailed answers to this question are largely a hope for the future. Coherent Infomax hypothesizes a local building-block from which endlessly many architectures could be built, but use of that to enlighten the obvious diversity is a task hardly yet begun. Similarly, though major transitions in the evolution of inferential capabilities seem plausible, study of what they may be remains a task for the future (Phillips Reference Phillips2012). By deriving algorithms for learning, Coherent Infomax shows in principle how endless diversity can arise from diverse lives, and it has been shown that the effectiveness of contextual-coordination varies greatly across people of different ages (Doherty et al. Reference Doherty, Campbell, Tsuji and Phillips2010), sex (Phillips et al. Reference Phillips, Chapman and Berry2004), and culture (Doherty et al. Reference Doherty, Tsuji and Phillips2008). Use of this possible source of variability to enlighten diversity across and within species still has far to go, however.
Overall, I expect theories such as those examined by Clark to have far-reaching consequences for philosophy, and human thought in general, so I fully endorse the journey on which he has embarked.
Much in Clark's review is of fundamental importance. Probabilistic inference is crucial to life in general and neural systems in particular, but does it have a single coherent logic? Jaynes (Reference Jaynes2003) argued that it does, but for that logic to be relevant to brain theory, it must be shown how systems built from local neural processors can perform essential functions that are assumed to be the responsibility of the scientist in Jaynes' theory (Fiorillo Reference Fiorillo2012; Phillips Reference Phillips2012).
Most crucial of those functions are selection of the information relevant to the role of each local cell or microcircuit and coordination of their multiple concurrent activities. The information available to neural systems is so rich that it cannot be used for inference if taken as a single, multi-dimensional whole because the number of locations in multi-dimensional space increases exponentially with dimensionality. Most events that actually occur in high-dimensional spaces are therefore novel and distant from previous events, precluding learning based on sample probabilities. This constraint, well-known to the machine-learning community as the curse-of-dimensionality, has major consequences for psychology and neuroscience. It implies that for learning and inference to be possible large data-bases must be divided into small subsets, as amply confirmed by the clear selectivity observed within and between brain regions at all hierarchical levels. Creation of the subsets involves both prespecified mechanisms, as in receptive field selectivity, and dynamic grouping as proposed by Gestalt psychology (Phillips et al. Reference Phillips, von der Malsburg, Singer, von der Malsburg, Phillips and Singer2010). The criteria for selection must be use-dependent because information crucial to one use would be fatal to another, as in the contrast between dorsal and ventral visual pathways. Contextual modulation is also crucial because interpretations with low probability overall may have high probability in certain contexts. Therefore, the activity of local processors must be guided by the broader context, and their multiple concurrent decisions must be coordinated if they are to create coherent percepts, thoughts, and actions.
Most models of predictive coding (PC) and Bayesian inference (BI) assume that the information to be coded and used for inference is a given. In those models, it is – by the modelers. Modelers may assume that in the real world this information is given by the external input, but that provides more information than could be used for inference if taken as a whole. Self-organized selection of the information relevant to particular uses is therefore crucial. Efficient coding strategies, such as PC, are concerned with ways of transmitting information through a hierarchy, not with deciding what information to transmit. They assume lossless transmission of all input information to be the goal, and so provide no way of extracting different information for different uses. Models using BI show how to combine information from different sources when computing a single posterior decision; but they do not show how local neural processors can select the relevant information, nor do they show how multiple streams of processing can coordinate their activities. Thus, local selectivity, dynamic-grouping, contextual-disambiguation, and coordinating interactions are all necessary within cognitive systems, but are not adequately explained by the essential principles of either PC or BI.
Clark's review, however, does contain the essence of an idea that could help resolve the mysteries of selectivity and coordination, that is, context-sensitive gain-control, for which there are several widely-distributed neural mechanisms. A crucial strength of the free-energy theory is that it uses gain-controlling interactions to implement attention (Feldman & Friston Reference Feldman and Friston2010), but such mechanisms can do far more than that. For example, they can select and coordinate activities by amplifying or suppressing them as a function of their predictive relationships and current relevance. This is emphasized by the theory of Coherent Infomax (Kay et al. Reference Kay, Floreano and Phillips1998; Kay & Phillips Reference Kay and Phillips2010; Phillips et al. Reference Phillips, Kay and Smyth1995), which synthesizes evidence from neuroanatomy, neurophysiology, macroscopic neuroimaging, and psychophysics (Phillips & Singer Reference Phillips and Singer1997; von der Malsburg et al. Reference von der Malsburg, Phillips and Singer2010). That theory is further strengthened by evidence from psychopathology as reviewed by Phillips and Silverstein (Reference Phillips and Silverstein2003), and extended by many subsequent studies. Körding and König (Reference Körding and König2000) argue for a closely related theory.
Free-energy theory (Friston Reference Friston2010) and Coherent Infomax assume that good predictions are vital, and formalize that assumption as an information theoretic objective. Though these theories have superficial differences, with Coherent Infomax being formulated at the neuronal rather than the system level, it may be possible to unify their objectives as that of maximizing prediction success, which, under plausible assumptions, is equivalent to minimizing prediction error (Phillips & Friston, in preparation). Formulating the objective as maximizing the amount of information correctly predicted directly solves the “dark-room” problem discussed by Clark. That objective, however, does not necessarily imply that prediction errors are the fundamental currency of feedforward communication. Inferences could be computed by reducing prediction errors locally, and communicating inferences more widely (Spratling Reference Spratling2008a). That version of PC is supported by much neurobiological evidence, though it remains possible that neural systems use both versions.
Another important issue concerns the obvious diversity of brains and cognition. How could any unifying theory cast light on that? Though possible in principle, detailed answers to this question are largely a hope for the future. Coherent Infomax hypothesizes a local building-block from which endlessly many architectures could be built, but use of that to enlighten the obvious diversity is a task hardly yet begun. Similarly, though major transitions in the evolution of inferential capabilities seem plausible, study of what they may be remains a task for the future (Phillips Reference Phillips2012). By deriving algorithms for learning, Coherent Infomax shows in principle how endless diversity can arise from diverse lives, and it has been shown that the effectiveness of contextual-coordination varies greatly across people of different ages (Doherty et al. Reference Doherty, Campbell, Tsuji and Phillips2010), sex (Phillips et al. Reference Phillips, Chapman and Berry2004), and culture (Doherty et al. Reference Doherty, Tsuji and Phillips2008). Use of this possible source of variability to enlighten diversity across and within species still has far to go, however.
Overall, I expect theories such as those examined by Clark to have far-reaching consequences for philosophy, and human thought in general, so I fully endorse the journey on which he has embarked.