We would like to compliment Clark for his comprehensive and insightful review of the strengths and limitations of hierarchical predictive processing and its application to modeling actions as well as perception. We agree that the search for fundamental theoretical principles will be key in explaining and uniting the myriad functions of the brain. Here, we hope to contribute to the discussion by reconsidering a particular challenge to the minimum prediction error (MPE) principle identified by Clark, which we dub the “Dark Room Dilemma,” and by offering an alternate solution that captures both the drive to reduce errors and the drive to seek out complex and interesting situations.
As described by Clark, a common challenge to extending the principle of minimum prediction error (MPE) to action selection is that it would drive an animal to seek out a dark room where predicting sensory inputs becomes trivial and precise. In response, Clark suggests that “animals like us live and forage in a changing and challenging world, and hence ‘expect’ to deploy quite complex ‘itinerant’ strategies” (sect. 3.2, para. 2). At first, this response seems tautological: We act so that we can predict the outcome of our actions; we predict that our actions will be complex and interesting; and therefore we act in complex and interesting ways. The tautology is broken by invoking a prior expectation on action, one presumably hardwired and selected for by evolutionary pressures. But, such an assumption would seem to remove the explanatory power of the MPE principle in describing complex behaviors. Furthermore, it goes against the common view that the evolutionary advantage of the brain lies in the ability to be adaptive and alleviate much of the need for hardwired pre-programming (pre-expectations) of behavior. A more satisfying solution to the “Dark Room Dilemma” may potentially be found in a different information theoretic interpretation of the interaction between action and perception.
Clark turns to the free-energy formulation for an information theoretic interpretation of the MPE principle (Friston & Stephan Reference Friston and Stephan2007). Within this framework, average prediction error is captured by the information theoretic measure entropy, which quantifies an agent's informational cost for representing the sensory input by its internal model. An alternative quantification of the predictive accuracy of an internal model would be to consider its mutual information (MI) with the sensory inputs. MI quantifies the information shared between two distributions – in this case, the informational content the internal states of the brain hold regarding its future sensory inputs. MI and entropy are in a sense converses of one another. Entropy is the informational cost of a (bad) internal model, while MI is the informational gains of a (good) internal model. When selecting a model, minimizing entropy and maximizing MI both yield minimal prediction error. When selecting actions, however, these two principles yield very different results.
Actions allow an agent, through the sensor-motor loop, to change the statistics of its sensory inputs. It is in response to such changes that the principles of maximizing MI and minimizing entropy differ. This difference can be highlighted by a hypothetical extreme, in which an agent acts to remove all variation in its sensory inputs – that is, it dwells in a “Dark Room.” Here, a trivial model can perfectly predict sensory inputs without any information cost. Entropy thus goes to zero satisfying the principle of minimal entropy. Similarly, MI also goes to zero in a Dark Room. Without variation in sensory inputs there is no information for the internal model to try to capture. This violates the maximal MI principle. Instead, of entering a “Dark Room,” an agent following a principle of maximal MI would seek out conditions in which its sensory inputs vary in a complex, but still predictable, fashion. This is because MI is bounded below by the variability in sensory input and bounded above by its ability to predict. Thus, MI balances predictability with complexity. Passively, maximizing MI accomplishes the same objective as minimizing entropy, namely the reduction of prediction error, but actively it encourages an escape from the Dark Room.
The prediction–complexity duality of MI and its importance to learning has been a recurring finding in computational methods. Important early implementations of a maximal MI principle in modeling passive learning include the Computational Mechanics approach for dynamical systems of Crutchfield and Young (Reference Crutchfield and Young1989) and the Information Bottleneck Method of Tishby et al. (Reference Tishby, Pereira, Bialek, Hajek and Sreenivas1999) for analyzing time series. Recently, the Information Bottleneck method has been extended to action selection by Still (Reference Still2009). Further, the Predictive Information Model of Ay et al. (Reference Ay, Bertschinger, Der, Güttler and Olbrich2008) has shown that complex behaviors can emerge from simple manipulations of action controllers towards maximizing the mutual information between states. And our own work utilizes MI to drive exploratory behaviors (Little & Sommer Reference Little and Sommer2011).
The principle of minimum prediction error and the related hierarchical prediction models offer important insights that should not be discounted. Our aim is not to suggest otherwise. Indeed, we favor the view that hierarchical prediction models could explain the motor implementation of intended actions. But we also believe its explanatory value is limited. Specifically, it would be desirable for a theoretical principle of the brain to address and not spare the intriguing question of what makes animals, even the simplest ones, venture out of their dark rooms.
We would like to compliment Clark for his comprehensive and insightful review of the strengths and limitations of hierarchical predictive processing and its application to modeling actions as well as perception. We agree that the search for fundamental theoretical principles will be key in explaining and uniting the myriad functions of the brain. Here, we hope to contribute to the discussion by reconsidering a particular challenge to the minimum prediction error (MPE) principle identified by Clark, which we dub the “Dark Room Dilemma,” and by offering an alternate solution that captures both the drive to reduce errors and the drive to seek out complex and interesting situations.
As described by Clark, a common challenge to extending the principle of minimum prediction error (MPE) to action selection is that it would drive an animal to seek out a dark room where predicting sensory inputs becomes trivial and precise. In response, Clark suggests that “animals like us live and forage in a changing and challenging world, and hence ‘expect’ to deploy quite complex ‘itinerant’ strategies” (sect. 3.2, para. 2). At first, this response seems tautological: We act so that we can predict the outcome of our actions; we predict that our actions will be complex and interesting; and therefore we act in complex and interesting ways. The tautology is broken by invoking a prior expectation on action, one presumably hardwired and selected for by evolutionary pressures. But, such an assumption would seem to remove the explanatory power of the MPE principle in describing complex behaviors. Furthermore, it goes against the common view that the evolutionary advantage of the brain lies in the ability to be adaptive and alleviate much of the need for hardwired pre-programming (pre-expectations) of behavior. A more satisfying solution to the “Dark Room Dilemma” may potentially be found in a different information theoretic interpretation of the interaction between action and perception.
Clark turns to the free-energy formulation for an information theoretic interpretation of the MPE principle (Friston & Stephan Reference Friston and Stephan2007). Within this framework, average prediction error is captured by the information theoretic measure entropy, which quantifies an agent's informational cost for representing the sensory input by its internal model. An alternative quantification of the predictive accuracy of an internal model would be to consider its mutual information (MI) with the sensory inputs. MI quantifies the information shared between two distributions – in this case, the informational content the internal states of the brain hold regarding its future sensory inputs. MI and entropy are in a sense converses of one another. Entropy is the informational cost of a (bad) internal model, while MI is the informational gains of a (good) internal model. When selecting a model, minimizing entropy and maximizing MI both yield minimal prediction error. When selecting actions, however, these two principles yield very different results.
Actions allow an agent, through the sensor-motor loop, to change the statistics of its sensory inputs. It is in response to such changes that the principles of maximizing MI and minimizing entropy differ. This difference can be highlighted by a hypothetical extreme, in which an agent acts to remove all variation in its sensory inputs – that is, it dwells in a “Dark Room.” Here, a trivial model can perfectly predict sensory inputs without any information cost. Entropy thus goes to zero satisfying the principle of minimal entropy. Similarly, MI also goes to zero in a Dark Room. Without variation in sensory inputs there is no information for the internal model to try to capture. This violates the maximal MI principle. Instead, of entering a “Dark Room,” an agent following a principle of maximal MI would seek out conditions in which its sensory inputs vary in a complex, but still predictable, fashion. This is because MI is bounded below by the variability in sensory input and bounded above by its ability to predict. Thus, MI balances predictability with complexity. Passively, maximizing MI accomplishes the same objective as minimizing entropy, namely the reduction of prediction error, but actively it encourages an escape from the Dark Room.
The prediction–complexity duality of MI and its importance to learning has been a recurring finding in computational methods. Important early implementations of a maximal MI principle in modeling passive learning include the Computational Mechanics approach for dynamical systems of Crutchfield and Young (Reference Crutchfield and Young1989) and the Information Bottleneck Method of Tishby et al. (Reference Tishby, Pereira, Bialek, Hajek and Sreenivas1999) for analyzing time series. Recently, the Information Bottleneck method has been extended to action selection by Still (Reference Still2009). Further, the Predictive Information Model of Ay et al. (Reference Ay, Bertschinger, Der, Güttler and Olbrich2008) has shown that complex behaviors can emerge from simple manipulations of action controllers towards maximizing the mutual information between states. And our own work utilizes MI to drive exploratory behaviors (Little & Sommer Reference Little and Sommer2011).
The principle of minimum prediction error and the related hierarchical prediction models offer important insights that should not be discounted. Our aim is not to suggest otherwise. Indeed, we favor the view that hierarchical prediction models could explain the motor implementation of intended actions. But we also believe its explanatory value is limited. Specifically, it would be desirable for a theoretical principle of the brain to address and not spare the intriguing question of what makes animals, even the simplest ones, venture out of their dark rooms.