Predictive coding (PC) is typically implemented using a hierarchy of neural populations, alternating between populations of error-detecting neurons and populations of prediction neurons. In the standard implementation of PC (Friston Reference Friston2005; Rao & Ballard Reference Rao and Ballard1999), each population of prediction neurons sends excitatory connections forward to the subsequent population of error-detecting neurons, and also sends inhibitory connections backwards to the preceding population of error-detecting neurons. Similarly, each population of error-detecting neurons also sends information in both directions; via excitatory connection to the following population of prediction neurons, and via inhibitory connections to the preceding population of prediction neurons. (See, for example, Figure 2 in Friston [2005], or Figure 2b in Spratling [2008b]). It is therefore inaccurate for Clark to state (see sects. 1.1 and 2.1) that in PC the feedforward flow of information solely conveys prediction error, while feedback only conveys predictions. Presumably what Clark really means to say is that the standard implementation of PC proposes that inter-regional feedforward connections carry error, whereas inter-regional feedback connections carry predictions (while information flow in the reverse directions takes place within each cortical area). However, this is simply one hypothesis about how PC should be implemented in cortical circuitry. It is also possible to group neural populations differently so that inter-regional feedforward connections carry predictions, not errors (Spratling Reference Spratling2008b).
As alternative implementations of the same computational theory, these two ways of grouping neural populations are compatible with the same psychophysical, brain imaging, and neurophysiological data reviewed in section 3.1 of the target article. However, they do suggest that different cortical circuitry may underlie these outward behaviours. This means that claims (repeated by Clark in sect. 2.1) that prediction neurons correspond to pyramidal cells in the deep layers of the cortex, while error-detecting neurons correspond to pyramidal cells in superficial cortical layers, are not predictions of PC in general, but predictions of one specific implementation of PC. These claims, therefore, do not constitute falsifiable predictions of PC (if they did then the idea that PC operates in the retina – as discussed in sect. 1.3 – could be rejected, due to the lack of cortical pyramidal cells in retinal circuitry!). Indeed, it is highly doubtful that these claims even constitute falsifiable predictions of the standard implementation of PC. The standard implementation is defined at a level of abstraction above that of cortical biophysics: it contains many biologically implausible features, like neurons that can generate both positive and negative firing rates. The mapping between elements of the standard implementation of PC and elements of cortical circuitry may, therefore, be far less direct than is suggested by the claim about deep and superficial layer pyramidal cells. For example, the role of prediction neurons and/or error-detecting neurons in the model might be performed by more complex cortical circuitry made up of diverse populations of neurons, none of which behave like the model neurons but whose combined action results in the same computation being performed.
The fact that PC is typically implemented at a level of abstraction that is intermediate between that of low-level, biophysical, circuits and that of high-level, psychological, behaviours is a virtue. Such intermediate-level models can identify common computational principles that operate across different structures of the nervous system and across different species (Carandini Reference Carandini2012; Phillips & Singer Reference Phillips and Singer1997); they seek integrative explanations that are consistent between levels of description (Bechtel Reference Bechtel, Schouten and de Jong2006; Mareschal et al. Reference Mareschal, Johnson, Siros, Spratling, Thomas and Westermann2007), and they provide functional explanations of the empirical data that are arguably the most relevant to neuroscience (Carandini et al. Reference Carandini, Demb, Mante, Tolhurst, Dan, Olshausen, Gallant and Rust2005; Olshausen & Field Reference Olshausen and Field2005). For PC, the pursuit of consistency across levels may prove to be a particularly important contribution to the modelling of Bayesian inference. Bayes' theorem states that the posterior is proportional to the product of the likelihood and the prior. However, it places no constraints on how these probabilities are calculated. Hence, any model that involves multiplying two numbers together, where those numbers can be plausibly claimed to represent the likelihood and posterior, can be passed off as a Bayesian model. This has led to numerous computational models which lay claim to probabilistic respectability while employing mechanisms to derive “probabilities” that are as ad-hoc and unprincipled as the non-Bayesian models they claim superiority over. It can be hoped that PC will provide a framework with sufficient constraints to allow principled models of hierarchical Bayesian inference to be derived.
A final point about different implementations is that they are not necessarily all equal. As well as implementing the PC theory using different ways of grouping neural populations, we can also implement the theory using different mathematical operations. Compared to the standard implementation of PC, one alternative implementation (PC/BC) is mathematically simpler while explaining more of the neurophysiological data: Compare the range of V1 response properties accounted for by PC/BC (Spratling Reference Spratling2010; Reference Spratling2011; Reference Spratling2012a; Reference Spratling2012b) with that simulated by the standard implementation of PC (Rao & Ballard Reference Rao and Ballard1999); or the range of attentional data accounted for by the PC/BC implementation (Spratling Reference Spratling2008a) compared to the standard implementation (Feldman & Friston Reference Feldman and Friston2010). Compared to the standard implementation, PC/BC is also more biologically plausible; for example, it does not employ negative firing rates. However, PC/BC is still defined at an intermediate-level of abstraction, and therefore, like the standard implementation, provides integrative and functional explanations of empirical data (Spratling Reference Spratling2011). It can also be interpreted as a form of hierarchical Bayesian inference (Lochmann & Deneve Reference Lochmann and Deneve2011). However, it goes beyond the standard implementation of PC by identifying computational principles that are shared with algorithms used in machine learning, such as generative models, matrix factorization methods, and deep learning architectures (Spratling Reference Spratling2012b), as well as linking to alternative theories of brain function, such as divisive normalisation and biased competition (Spratling Reference Spratling2008a; Reference Spratling2008b). Other implementations of PC may in future prove to be even better models of brain function, which is even more reason not to confuse one particular implementation of a theory with the theory itself.
Predictive coding (PC) is typically implemented using a hierarchy of neural populations, alternating between populations of error-detecting neurons and populations of prediction neurons. In the standard implementation of PC (Friston Reference Friston2005; Rao & Ballard Reference Rao and Ballard1999), each population of prediction neurons sends excitatory connections forward to the subsequent population of error-detecting neurons, and also sends inhibitory connections backwards to the preceding population of error-detecting neurons. Similarly, each population of error-detecting neurons also sends information in both directions; via excitatory connection to the following population of prediction neurons, and via inhibitory connections to the preceding population of prediction neurons. (See, for example, Figure 2 in Friston [2005], or Figure 2b in Spratling [2008b]). It is therefore inaccurate for Clark to state (see sects. 1.1 and 2.1) that in PC the feedforward flow of information solely conveys prediction error, while feedback only conveys predictions. Presumably what Clark really means to say is that the standard implementation of PC proposes that inter-regional feedforward connections carry error, whereas inter-regional feedback connections carry predictions (while information flow in the reverse directions takes place within each cortical area). However, this is simply one hypothesis about how PC should be implemented in cortical circuitry. It is also possible to group neural populations differently so that inter-regional feedforward connections carry predictions, not errors (Spratling Reference Spratling2008b).
As alternative implementations of the same computational theory, these two ways of grouping neural populations are compatible with the same psychophysical, brain imaging, and neurophysiological data reviewed in section 3.1 of the target article. However, they do suggest that different cortical circuitry may underlie these outward behaviours. This means that claims (repeated by Clark in sect. 2.1) that prediction neurons correspond to pyramidal cells in the deep layers of the cortex, while error-detecting neurons correspond to pyramidal cells in superficial cortical layers, are not predictions of PC in general, but predictions of one specific implementation of PC. These claims, therefore, do not constitute falsifiable predictions of PC (if they did then the idea that PC operates in the retina – as discussed in sect. 1.3 – could be rejected, due to the lack of cortical pyramidal cells in retinal circuitry!). Indeed, it is highly doubtful that these claims even constitute falsifiable predictions of the standard implementation of PC. The standard implementation is defined at a level of abstraction above that of cortical biophysics: it contains many biologically implausible features, like neurons that can generate both positive and negative firing rates. The mapping between elements of the standard implementation of PC and elements of cortical circuitry may, therefore, be far less direct than is suggested by the claim about deep and superficial layer pyramidal cells. For example, the role of prediction neurons and/or error-detecting neurons in the model might be performed by more complex cortical circuitry made up of diverse populations of neurons, none of which behave like the model neurons but whose combined action results in the same computation being performed.
The fact that PC is typically implemented at a level of abstraction that is intermediate between that of low-level, biophysical, circuits and that of high-level, psychological, behaviours is a virtue. Such intermediate-level models can identify common computational principles that operate across different structures of the nervous system and across different species (Carandini Reference Carandini2012; Phillips & Singer Reference Phillips and Singer1997); they seek integrative explanations that are consistent between levels of description (Bechtel Reference Bechtel, Schouten and de Jong2006; Mareschal et al. Reference Mareschal, Johnson, Siros, Spratling, Thomas and Westermann2007), and they provide functional explanations of the empirical data that are arguably the most relevant to neuroscience (Carandini et al. Reference Carandini, Demb, Mante, Tolhurst, Dan, Olshausen, Gallant and Rust2005; Olshausen & Field Reference Olshausen and Field2005). For PC, the pursuit of consistency across levels may prove to be a particularly important contribution to the modelling of Bayesian inference. Bayes' theorem states that the posterior is proportional to the product of the likelihood and the prior. However, it places no constraints on how these probabilities are calculated. Hence, any model that involves multiplying two numbers together, where those numbers can be plausibly claimed to represent the likelihood and posterior, can be passed off as a Bayesian model. This has led to numerous computational models which lay claim to probabilistic respectability while employing mechanisms to derive “probabilities” that are as ad-hoc and unprincipled as the non-Bayesian models they claim superiority over. It can be hoped that PC will provide a framework with sufficient constraints to allow principled models of hierarchical Bayesian inference to be derived.
A final point about different implementations is that they are not necessarily all equal. As well as implementing the PC theory using different ways of grouping neural populations, we can also implement the theory using different mathematical operations. Compared to the standard implementation of PC, one alternative implementation (PC/BC) is mathematically simpler while explaining more of the neurophysiological data: Compare the range of V1 response properties accounted for by PC/BC (Spratling Reference Spratling2010; Reference Spratling2011; Reference Spratling2012a; Reference Spratling2012b) with that simulated by the standard implementation of PC (Rao & Ballard Reference Rao and Ballard1999); or the range of attentional data accounted for by the PC/BC implementation (Spratling Reference Spratling2008a) compared to the standard implementation (Feldman & Friston Reference Feldman and Friston2010). Compared to the standard implementation, PC/BC is also more biologically plausible; for example, it does not employ negative firing rates. However, PC/BC is still defined at an intermediate-level of abstraction, and therefore, like the standard implementation, provides integrative and functional explanations of empirical data (Spratling Reference Spratling2011). It can also be interpreted as a form of hierarchical Bayesian inference (Lochmann & Deneve Reference Lochmann and Deneve2011). However, it goes beyond the standard implementation of PC by identifying computational principles that are shared with algorithms used in machine learning, such as generative models, matrix factorization methods, and deep learning architectures (Spratling Reference Spratling2012b), as well as linking to alternative theories of brain function, such as divisive normalisation and biased competition (Spratling Reference Spratling2008a; Reference Spratling2008b). Other implementations of PC may in future prove to be even better models of brain function, which is even more reason not to confuse one particular implementation of a theory with the theory itself.