Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-02-11T14:50:12.312Z Has data issue: false hasContentIssue false

Sparse coding and challenges for Bayesian models of the brain

Published online by Cambridge University Press:  10 May 2013

Thomas Trappenberg
Affiliation:
Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada. tt@cs.dal.capaulhollensen@gmail.comwww.cs.dal.ca/~tt
Paul Hollensen
Affiliation:
Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada. tt@cs.dal.capaulhollensen@gmail.comwww.cs.dal.ca/~tt

Abstract

While the target article provides a glowing account for the excitement in the field, we stress that hierarchical predictive learning in the brain requires sparseness of the representation. We also question the relation between Bayesian cognitive processes and hierarchical generative models as discussed by the target article.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2013 

Clark's target article captures well our excitement about predictive coding and the ability of humans to include uncertainty in making cognitive decisions. One additional factor for representational learning to match biological findings that has not been stressed much in the target article is the importance of sparseness constraints. We discuss this here, together with some critical remarks on Bayesian models and some remaining challenges quantifying the general approach.

There are many unsupervised generative models that can be used to learn representations to reconstruct input data. Consider, for example, photographs of natural images. A common method for dimensionality reduction is principle component analysis that represents data along orthogonal feature vectors of decreasing variance. However, as nicely pointed out by Olshausen and Field (Reference Olshausen and Field1996), the corresponding filters do not resemble receptive fields in the brain. In contrast, if a generative model has the additional constraint to minimize not only the reconstruction error but also the number of basis functions that are used for any specific image, then filters emerge that resemble receptive fields of simple cells in the primary visual cortex.

Sparse representation in the neuroscientific context actually has a long and important history. Horace Barlow pointed out for years that the visual system seems to be remarkably set up for sparse representations (Barlow Reference Barlow and Rosenblith1961), and probably the first systematic model in this direction was proposed by his student Peter Földiák (Reference Földiák1990). It seems that nearly every generative model with a sparseness constraint can reproduce receptive fields resembling simple cells (Saxe et al. Reference Saxe, Bhand, Mudur, Suresh and Ng2011), and Ng and colleagues have shown that sparse hierarchical Restricted Boltzmann Machines (RBMs) resembles features of receptive fields in V1 and V2 (Lee et al. Reference Lee, Ekanadham, Ng, Platt, Koller, Singer and Roweis2008). In our own work, we have shown how lateral inhibition can implement sparseness constrains in a biological way while also promoting topographic representations (Hollensen & Trappenberg Reference Hollensen and Trappenberg2011).

Sparse representation has great advantages. By definition, it means that only a small number of cells have to be active to reproduce inputs in great detail. This not only has advantages energetically, it also represents a large compression of the data. Of course, the extreme case of maximal sparseness corresponding to grandmother cells is not desirable, as this would hinder any generalization ability of a model. Experimental evidence of sparse coding has been found in V1 (Vinje & Gallant Reference Vinje and Gallant2000) and hippocampus (Waydo et al. Reference Waydo, Kraskov, Quiroga, Fried and Koch2006).

The relation of the efficient coding principle to free energy is discussed by Friston (Reference Friston2010), who provides a derivation of free energy as the difference between complexity and accuracy. That is, minimizing free energy maximizes the probability of the data (accuracy), while also minimizing the difference (cross-entropy) between the causes we infer from the data and our prior on causes. The fact that the latter is termed complexity reflects our intuition that causes in the world lie in a smaller space than their sensory projections. Thus, our internal representation should mirror the sparse structure of the world.

While Friston shows the equivalence of Infomax and free energy minimization given a sparse prior, a fully Bayesian implementation would treat the prior itself as a random variable to be optimized through learning. Indeed, Friston goes on to say that the criticism of where these priors come from “dissolves with hierarchical generative models, in which the priors themselves are optimized” (Friston Reference Friston2010, p. 129). This is precisely what has not yet been achieved: a model which learns a sparse representation of sensory messages due to the world's sparseness, rather than due to its architecture or static priors. Of course, we are likely endowed with a range of priors built-in to our evolved cortical architecture in order to bootstrap or guide development. What these native priors are and the form they take is an interesting and open question.

There are two alternatives to innate priors for explaining the receptive fields we observe. First, there has been a strong tendency to learn hierarchical models layer-by-layer, with each layer learning to reconstruct the output of the previous without being influenced by top-down expectations. Such top-down modulation is the prime candidate for expressing empirical priors and influencing learning to incorporate high-level tendencies. Implementing a model that balances conforming to both its input and top-down expectations while offering efficient inference and robustness is a largely open question (Jaeger Reference Jaeger2011). Second, the data typically used to train our models on differs substantially from what we are exposed to. The visual cortex experiences a stream of images with substantial temporal coherence and correlation with internal signals such as eye movements, limiting the conclusions we can draw from comparing its representation to models trained on static images (see, e.g., Rust et al. Reference Rust, Schwartz, Movshon and Simoncelli2005).

The final comment we would like to make here concerns the discussion of Bayesian processes. Bayesian models such as the ideal observer have received considerable attention in neuroscience since they seem to nicely capture human abilities to combine new evidence with prior knowledge in the “correct” probabilistic sense. However, it is important to realize that these Bayesian models are very specific to limited experimental tasks, often with only a few possible relevant states, and such models do not generalize well to changing experimental conditions. In contrast, the Bayesian model of a Boltzmann machine represents general mechanistic implementations of information processing in the brain that we believe can implement a general learning machine. While all these models are Bayesian in the sense that they represent causal models with probabilistic nodes, the nature of the models are very different. It is fascinating to think about how such specific Bayesian models as the ideal observer can emerge from general learning machines such as the RBM. Indeed, such a demonstration would be necessary to underpin the story that hierarchical generative models support the Bayesian cognitive processing as discussed in the target article.

References

Barlow, H. B. (1961) Possible principles underlying the transformations of sensory messages. In: Sensory communication, ed. Rosenblith, W., pp. 217–34. (Chapter 13). MIT Press.Google Scholar
Földiák, P. (1990) Forming sparse representations by local anti-Hebbian learning. Biological Cybernetics 64:165–70.CrossRefGoogle ScholarPubMed
Friston, K. J. (2010) The free-energy principle: A unified brain theory? Nature Reviews Neuroscience 11(2):127–38.CrossRefGoogle ScholarPubMed
Hollensen, P. & Trappenberg, T. (2011) Learning sparse representations through learned inhibition. Poster presented at the COSYNE (Computational and Systems Neuroscience Conference) Annual Meeting, Salt Lake City, Utah, February 24, 2011.Google Scholar
Jaeger, H. (2011) Neural hierarchies: Singin' the blues. Oral presentation at Osnabrück Computational Cognition Alliance Meeting (OCCAM 2011), University of Osnabrück, Germany, June 22–24, 2011. Available at: http://video.virtuos.uni-osnabrueck.de:8080/engage/ui/watch.html?id=10bc55e8-8d98-40d3-bb11-17780b70c052&play=true.Google Scholar
Lee, H., Ekanadham, C. & Ng, A. (2008) Sparse deep belief net model for visual area V2. In: Advances in Neural Information Processing Systems 20 (NIPS'07), ed. Platt, J., Koller, D., Singer, Y. & Roweis, S., pp. 873–80. MIT Press.Google Scholar
Olshausen, B. A. & Field, D. J. (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607609.CrossRefGoogle ScholarPubMed
Rust, N. C., Schwartz, O., Movshon, J. A. & Simoncelli, E. P. (2005) Spatiotemporal elements of Macaque V1 receptive fields. Neuron 46:945–56.CrossRefGoogle ScholarPubMed
Saxe, A., Bhand, M., Mudur, R., Suresh, B. & Ng, A. (2011) Modeling cortical representational plasticity with unsupervised feature learning. Poster presented at COSYNE 2011, Salt Lake City, Utah, February 24–27, 2011. Available at: http://www.stanford.edu/~asaxe/papers.Google Scholar
Vinje, W. E. & Gallant, J. L. (2000) Sparse coding and decorrelation in primary visual cortex during natural vision. Science 287:1273–76.Google Scholar
Waydo, S., Kraskov, A., Quiroga, R. Q., Fried, I. & Koch, C. (2006) Sparse representation in the human medial temporal lobe. Journal of Neuroscience 26:10232–34.CrossRefGoogle ScholarPubMed