Lake et al. identify some extremely important desiderata for human-like intelligence. We agree with many of their central assertions: Human-like learning and decision making surely do depend upon rich internal models; the learning process must be informed and constrained by prior knowledge, whether this is part of the agent's initial endowment or acquired through learning; and naturally, prior knowledge will offer the greatest leverage when it reflects the most pervasive or ubiquitous structures in the environment, including physical laws, the mental states of others, and more abstract regularities such as compositionality and causality. Together, these points comprise a powerful set of target goals for AI research. However, while we concur on these goals, we choose a differently calibrated strategy for accomplishing them. In particular, we favor an approach that prioritizes autonomy, empowering artificial agents to learn their own internal models and how to use them, mitigating their reliance on detailed configuration by a human engineer.
Lake et al. characterize their position as “agnostic with regards to the origins of the key ingredients” (sect. 4, para. 2) of human-like intelligence. This agnosticism implicitly licenses a modeling approach in which detailed, domain-specific information can be imparted to an agent directly, an approach for which some of the authors' Bayesian Program Learning (BPL) work is emblematic. The two domains Lake and colleagues focus most upon – physics and theory of mind – are amenable to such an approach, in that these happen to be fields for which mature scientific disciplines exist. This provides unusually rich support for hand design of cognitive models. However, it is not clear that such hand design will be feasible in other more idiosyncratic domains where comparable scaffolding is unavailable. Lake et al. (Reference Lake, Salakhutdinov and Tenenbaum2015a) were able to extend the approach to Omniglot characters by intuiting a suitable (stroke-based) model, but are we in a position to build comparably detailed domain models for such things as human dialogue and architecture? What about Japanese cuisine or ice skating? Even video-game play appears daunting, when one takes into account the vast amount of semantic knowledge that is plausibly relevant (knowledge about igloos, ice floes, cold water, polar bears, video-game levels, avatars, lives, points, and so forth). In short, it is not clear that detailed knowledge engineering will be realistically attainable in all areas we will want our agents to tackle.
Given this observation, it would appear most promising to focus our efforts on developing learning systems that can be flexibly applied across a wide range of domains, without an unattainable overhead in terms of a priori knowledge. Encouraging this view, the recent machine learning literature offers many examples of learning systems conquering tasks that had long eluded more hand-crafted approaches, including object recognition, speech recognition, speech generation, language translation, and (significantly) game play (Silver et al. Reference Silver, Huang, Maddison, Guez, Sifre, Driessche, Schrittwieser, Antonoglou, Panneershelvam, Lanctot, Dieleman, Grewe, Nham, Kalchbrenner, Sutskever, Lillicrap, Leach, Kavukcuoglu, Graepel and Hassabis2016). In many cases, such successes have depended on large amounts of training data, and have implemented an essentially model-free approach. However, a growing volume of work suggests that flexible, domain-general learning can also be successful on tasks where training data are scarcer and where model-based inference is important.
For example, Rezende and colleagues (Reference Rezende, Mohamed, Danihelka, Gregor and Wierstra2016) reported a deep generative model that produces plausible novel instances of Omniglot characters after one presentation of a model character, going a significant distance toward answering Lake's “Character Challenge.” Lake et al. call attention to this model's “need for extensive pre-training.” However, it is not clear why their pre-installed model is to be preferred over knowledge acquired through pre-training. In weighing this point, it is important to note that the human modeler, to furnish the BPL architecture with its “start-up software,” must draw on his or her own large volume of prior experience. In this sense, the resulting BPL model is dependent on the human designer's own “pre-training.”
A more significant aspect of the Rezende model is that it can be applied without change to very different domains, as Rezende and colleagues (Reference Rezende, Mohamed, Danihelka, Gregor and Wierstra2016) demonstrate through experiments on human facial images. This flexibility is one hallmark of an autonomous learning system, and contrasts with the more purpose-built flavor of the BPL approach, which relies on irreducible primitives with domain-specific content (e.g., the strokes in Lake's Omniglot model). Furthermore, a range of recent work with deep generative models (e.g. van den Oord Reference van den Oord, Kalchbrenner and Kavukcuoglu2016; Ranzato et al. Reference Ranzato, Szlam, Bruna, Mathieu, Collobert and Chopra2016) indicates that they can identify quite rich structure, increasingly avoiding silly mistakes like those highlighted in Lake et al.'s Figure 6.
Importantly, a learning-centered approach does not prevent us from endowing learning systems with some forms of a priori knowledge. Indeed, the current resurgence in neural network research was triggered largely by work that does just this, for example, by building an assumption of translational invariance into the weight matrix of image classification networks (Krizhevsky et al. Reference Krizhevsky, Sutskever, Hinton, Pereira, Burges, Bottou and Weinberger2012a). The same strategy can be taken to endow learning systems with assumptions about compositional and causal structure, yielding architectures that learn efficiently about the dynamics of physical systems, and even generalize to previously unseen numbers of objects (Battaglia et al. Reference Battaglia, Pascanu, Lai, Rezende, Lee, Sugiyama, Luxburg, Guyon and Garnett2016), another challenge problem highlighted by Lake et al. In such cases, however, the inbuilt knowledge takes a highly generic form, leaving wide scope for learning to absorb domain-specific structure (see also Eslami et al Reference Eslami, Heess, Weber, Tassa, Kavukcuoglu, Hinton, Lee, Sugiyama, Luxburg, Guyon and Garnett2016; Raposo et al. Reference Raposo, Santoro, Barrett, Pascanu, Lillicrap and Battaglia2017; Reed and de Freitas Reference Reed and de Freitas2016).
Under the approach we advocate, high-level prior knowledge and learning biases can be installed not only at the level of representational structure, but also through larger-scale architectural and algorithmic factors, such as attentional filtering (Eslami et al. Reference Eslami, Heess, Weber, Tassa, Kavukcuoglu, Hinton, Lee, Sugiyama, Luxburg, Guyon and Garnett2016), intrinsic motivation mechanisms (Bellemare et al. Reference Bellemare, Srinivasan, Ostrovski, Schaul, Saxton, Munos, Lee, Sugiyama, Luxburg, Guyon and Garnett2016), and episodic learning (Blundell et al. Reference Blundell, Uria, Pritzel, Li, Ruderman, Leibo, Rae, Wierstra and Hassabis2016). Recently developed architectures for memory storage (e.g., Graves et al. Reference Graves, Wayne, Reynolds, Harley, Danihelka, Grabska-Barwińska, Colmenarejo, Grefenstette, Ramalho, Agapiou, Badia, Hermann, Zwols, Ostrovski, Cain, King, Summerfield, Blunsom, Kayukcuoglu and Hassabis2016) offer a critical example. Lake et al. describe neural networks as implementing “learning as a process of gradual adjustment of connection strengths.” However, recent work has introduced a number of architectures within which learning depends on rapid storage mechanisms, independent of connection-weight changes (Duan et al. Reference Duan, Schulman, Chen, Bartlett, Sutskever and Abbeel2016; Graves et al. Reference Graves, Wayne, Reynolds, Harley, Danihelka, Grabska-Barwińska, Colmenarejo, Grefenstette, Ramalho, Agapiou, Badia, Hermann, Zwols, Ostrovski, Cain, King, Summerfield, Blunsom, Kayukcuoglu and Hassabis2016; Wang et al. Reference Wang, Kurth-Nelson, Tirumala, Soyer, Leibo, Munos, Blundell, Kumaran and Botvinick2017; Vinyals et al. Reference Vinyals, Blundell, Lillicrap, Wierstra, Vinyals, Blundell, Lillicrap, Kavukcuoglu, Wierstra, Lee, Sugiyama, Luxburg, Guyon and Garnett2016). Indeed, such mechanisms have even been applied to one-shot classification of Omniglot characters (Santoro et al., Reference Santoro, Bartunov, Botvinick, Wierstra and Lillicrap2016) and Atari video game play (Blundell et al. Reference Blundell, Uria, Pritzel, Li, Ruderman, Leibo, Rae, Wierstra and Hassabis2016). Furthermore, the connection-weight changes that do occur in such models can serve in part to support learning-to-learn (Duan et al. Reference Duan, Schulman, Chen, Bartlett, Sutskever and Abbeel2016; Graves et al. Reference Graves, Wayne, Reynolds, Harley, Danihelka, Grabska-Barwińska, Colmenarejo, Grefenstette, Ramalho, Agapiou, Badia, Hermann, Zwols, Ostrovski, Cain, King, Summerfield, Blunsom, Kayukcuoglu and Hassabis2016; Ravi and Larochelle Reference Ravi and Larochelle2017; Vinyals et al. Reference Vinyals, Blundell, Lillicrap, Wierstra, Vinyals, Blundell, Lillicrap, Kavukcuoglu, Wierstra, Lee, Sugiyama, Luxburg, Guyon and Garnett2016; Wang et al. Reference Wang, Kurth-Nelson, Tirumala, Soyer, Leibo, Munos, Blundell, Kumaran and Botvinick2017), another of Lake et al.'s key ingredients for human-like intelligence. As recent work has shown (Andrychowicz et al. Reference Andrychowicz, Denil, Gomez, Hoffman, Pfau, Schaul, Shillingford, de Freitas, Lee, Sugiyama, Luxburg, Guyon and Garnett2016; Denil et al. Reference Denil, Agrawal, Kulkarni, Erez, Battaglia and de Freitas2016; Duan et al. Reference Duan, Schulman, Chen, Bartlett, Sutskever and Abbeel2016; Hochreiter et al. Reference Hochreiter, Younger, Conwell, Dorffner, Bischoff and Hornik2001; Santoro et al. Reference Santoro, Bartunov, Botvinick, Wierstra and Lillicrap2016; Wang et al. Reference Wang, Kurth-Nelson, Tirumala, Soyer, Leibo, Munos, Blundell, Kumaran and Botvinick2017), this learning-to-learn mechanism can allow agents to adapt rapidly to new problems, providing a novel route to install prior knowledge through learning, rather than by hand. Learning to learn enables us to learn a neural network agent over a long time. This network, however, is trained to be good at learning rapidly from few examples, regardless of what those examples might be. So, although the meta-learning process might be slow, the product is a neural network agent that can learn to harness a few data points to carry out numerous tasks, including imitation, inference, task specialization, and prediction.
Another reason why we believe it may be advantageous to autonomously learn internal models is that such models can be shaped directly by specific, concrete tasks. A model is valuable not because it veridically captures some ground truth, but because it can be efficiently leveraged to support adaptive behavior. Just as Newtonian mechanics is sufficient for explaining many everyday phenomena, yet too crude to be useful to particle physicists and cosmologists, an agent's models should be calibrated to its tasks. This is essential for models to scale to real-world complexity, because it is usually too expensive, or even impossible, for a system to acquire and work with extremely fine-grained models of the world (Botvinick & Weinstein Reference Botvinick, Weinstein, Solway and Barto2015; Silver et al. Reference Silver, van Hasselt, Hessel, Schaul, Guez, Harley, Dulac-Arnold, Reichert, Rabinowitz, Barreto, Degris, Balcan and Weinberger2017). Of course, a good model of the world should be applicable across a range of task conditions, even ones that have not been previously encountered. However, this simply implies that models should be calibrated not only to individual tasks, but also to the distribution of tasks – inferred through experience or evolution – that is likely to arise in practice.
Finally, in addition to the importance of model building, it is important to recognize that real autonomy also depends on control functions, the processes that leverage models to make actual decisions. An autonomous agent needs good models, but it also needs to know how to make use of them (Botvinick & Cohen Reference Botvinick and Cohen2014), especially in settings where task goals may vary over time. This point also favors a learning and agent-based approach, because it allows control structures to co-evolve with internal models, maximizing their compatibility. Though efforts to capitalize on these advantages in practice are only in their infancy, recent work from Hamrick and colleagues (Reference Hamrick, Ballard, Pascanu, Vinyals, Heess and Battaglia2017), which simultaneously trained an internal model and a corresponding set of control functions, provides a case study of how this might work.
Our comments here, like the target article, have focused on model-based cognition. However, an aside on model-free methods is warranted. Lake et al. describe model-free methods as providing peripheral support for model-based approaches. However, there is abundant evidence that model-free mechanisms play a pervasive role in human learning and decision making (Kahneman Reference Kahneman2011). Furthermore, the dramatic recent successes of model-free learning in areas such as game play, navigation, and robotics suggest that it may constitute a first-class, independently valuable approach for machine learning. Lake et al. call attention to the heavy data demands of model-free learning, as reflected in DQN learning curves. However, even since the initial report on DQN (Mnih et al. Reference Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski, Petersen, Beattie, Sadik, Antonoglous, King, Kumaran, Wierstra and Hassabis2015), techniques have been developed that significantly reduce the data requirements of this and related model-free learning methods, including prioritized memory replay (Schaul et al. Reference Schaul, Quan, Antonoglou and Silver2016), improved exploration methods (Bellemare et al. Reference Bellemare, Srinivasan, Ostrovski, Schaul, Saxton, Munos, Lee, Sugiyama, Luxburg, Guyon and Garnett2016), and techniques for episodic reinforcement learning (Blundell et al. Reference Blundell, Uria, Pritzel, Li, Ruderman, Leibo, Rae, Wierstra and Hassabis2016). Given the pace of such advances, it may be premature to relegate model-free methods to a merely supporting role.
To conclude, despite the differences we have focused on here, we agree strongly with Lake et al. that human-like intelligence depends at least in part on richly structured internal models. Our approach to building human-like intelligence can be summarized as a commitment to developing autonomous agents: agents that shoulder the burden of building their own models and arriving at their own procedures for leveraging them. Autonomy, in this sense, confers a capacity to build economical task-sensitive internal models, and to adapt flexibly to diverse circumstances, while avoiding a dependence on detailed, domain-specific prior information. A key challenge in pursuing greater autonomy is the need to find more efficient means of extracting knowledge from potentially limited data. But recent work on memory, exploration, compositional representation, and processing architectures, provides grounds for optimism. In fairness, the authors of the target article have also offered, in other work, some indication of how their approach might be elaborated to support greater agent autonomy (Lake et al. Reference Lake, Lawrence and Tenenbaum2016). We may therefore be following slowly converging paths. On a final note, it is worth pointing out that as our agents gain in autonomy, the opportunity increasingly arises for us to obtain new insights from what they themselves discover. In this way, the pursuit of agent autonomy carries the potential to transform the current AI landscape, revealing new paths toward human-like intelligence.
Lake et al. identify some extremely important desiderata for human-like intelligence. We agree with many of their central assertions: Human-like learning and decision making surely do depend upon rich internal models; the learning process must be informed and constrained by prior knowledge, whether this is part of the agent's initial endowment or acquired through learning; and naturally, prior knowledge will offer the greatest leverage when it reflects the most pervasive or ubiquitous structures in the environment, including physical laws, the mental states of others, and more abstract regularities such as compositionality and causality. Together, these points comprise a powerful set of target goals for AI research. However, while we concur on these goals, we choose a differently calibrated strategy for accomplishing them. In particular, we favor an approach that prioritizes autonomy, empowering artificial agents to learn their own internal models and how to use them, mitigating their reliance on detailed configuration by a human engineer.
Lake et al. characterize their position as “agnostic with regards to the origins of the key ingredients” (sect. 4, para. 2) of human-like intelligence. This agnosticism implicitly licenses a modeling approach in which detailed, domain-specific information can be imparted to an agent directly, an approach for which some of the authors' Bayesian Program Learning (BPL) work is emblematic. The two domains Lake and colleagues focus most upon – physics and theory of mind – are amenable to such an approach, in that these happen to be fields for which mature scientific disciplines exist. This provides unusually rich support for hand design of cognitive models. However, it is not clear that such hand design will be feasible in other more idiosyncratic domains where comparable scaffolding is unavailable. Lake et al. (Reference Lake, Salakhutdinov and Tenenbaum2015a) were able to extend the approach to Omniglot characters by intuiting a suitable (stroke-based) model, but are we in a position to build comparably detailed domain models for such things as human dialogue and architecture? What about Japanese cuisine or ice skating? Even video-game play appears daunting, when one takes into account the vast amount of semantic knowledge that is plausibly relevant (knowledge about igloos, ice floes, cold water, polar bears, video-game levels, avatars, lives, points, and so forth). In short, it is not clear that detailed knowledge engineering will be realistically attainable in all areas we will want our agents to tackle.
Given this observation, it would appear most promising to focus our efforts on developing learning systems that can be flexibly applied across a wide range of domains, without an unattainable overhead in terms of a priori knowledge. Encouraging this view, the recent machine learning literature offers many examples of learning systems conquering tasks that had long eluded more hand-crafted approaches, including object recognition, speech recognition, speech generation, language translation, and (significantly) game play (Silver et al. Reference Silver, Huang, Maddison, Guez, Sifre, Driessche, Schrittwieser, Antonoglou, Panneershelvam, Lanctot, Dieleman, Grewe, Nham, Kalchbrenner, Sutskever, Lillicrap, Leach, Kavukcuoglu, Graepel and Hassabis2016). In many cases, such successes have depended on large amounts of training data, and have implemented an essentially model-free approach. However, a growing volume of work suggests that flexible, domain-general learning can also be successful on tasks where training data are scarcer and where model-based inference is important.
For example, Rezende and colleagues (Reference Rezende, Mohamed, Danihelka, Gregor and Wierstra2016) reported a deep generative model that produces plausible novel instances of Omniglot characters after one presentation of a model character, going a significant distance toward answering Lake's “Character Challenge.” Lake et al. call attention to this model's “need for extensive pre-training.” However, it is not clear why their pre-installed model is to be preferred over knowledge acquired through pre-training. In weighing this point, it is important to note that the human modeler, to furnish the BPL architecture with its “start-up software,” must draw on his or her own large volume of prior experience. In this sense, the resulting BPL model is dependent on the human designer's own “pre-training.”
A more significant aspect of the Rezende model is that it can be applied without change to very different domains, as Rezende and colleagues (Reference Rezende, Mohamed, Danihelka, Gregor and Wierstra2016) demonstrate through experiments on human facial images. This flexibility is one hallmark of an autonomous learning system, and contrasts with the more purpose-built flavor of the BPL approach, which relies on irreducible primitives with domain-specific content (e.g., the strokes in Lake's Omniglot model). Furthermore, a range of recent work with deep generative models (e.g. van den Oord Reference van den Oord, Kalchbrenner and Kavukcuoglu2016; Ranzato et al. Reference Ranzato, Szlam, Bruna, Mathieu, Collobert and Chopra2016) indicates that they can identify quite rich structure, increasingly avoiding silly mistakes like those highlighted in Lake et al.'s Figure 6.
Importantly, a learning-centered approach does not prevent us from endowing learning systems with some forms of a priori knowledge. Indeed, the current resurgence in neural network research was triggered largely by work that does just this, for example, by building an assumption of translational invariance into the weight matrix of image classification networks (Krizhevsky et al. Reference Krizhevsky, Sutskever, Hinton, Pereira, Burges, Bottou and Weinberger2012a). The same strategy can be taken to endow learning systems with assumptions about compositional and causal structure, yielding architectures that learn efficiently about the dynamics of physical systems, and even generalize to previously unseen numbers of objects (Battaglia et al. Reference Battaglia, Pascanu, Lai, Rezende, Lee, Sugiyama, Luxburg, Guyon and Garnett2016), another challenge problem highlighted by Lake et al. In such cases, however, the inbuilt knowledge takes a highly generic form, leaving wide scope for learning to absorb domain-specific structure (see also Eslami et al Reference Eslami, Heess, Weber, Tassa, Kavukcuoglu, Hinton, Lee, Sugiyama, Luxburg, Guyon and Garnett2016; Raposo et al. Reference Raposo, Santoro, Barrett, Pascanu, Lillicrap and Battaglia2017; Reed and de Freitas Reference Reed and de Freitas2016).
Under the approach we advocate, high-level prior knowledge and learning biases can be installed not only at the level of representational structure, but also through larger-scale architectural and algorithmic factors, such as attentional filtering (Eslami et al. Reference Eslami, Heess, Weber, Tassa, Kavukcuoglu, Hinton, Lee, Sugiyama, Luxburg, Guyon and Garnett2016), intrinsic motivation mechanisms (Bellemare et al. Reference Bellemare, Srinivasan, Ostrovski, Schaul, Saxton, Munos, Lee, Sugiyama, Luxburg, Guyon and Garnett2016), and episodic learning (Blundell et al. Reference Blundell, Uria, Pritzel, Li, Ruderman, Leibo, Rae, Wierstra and Hassabis2016). Recently developed architectures for memory storage (e.g., Graves et al. Reference Graves, Wayne, Reynolds, Harley, Danihelka, Grabska-Barwińska, Colmenarejo, Grefenstette, Ramalho, Agapiou, Badia, Hermann, Zwols, Ostrovski, Cain, King, Summerfield, Blunsom, Kayukcuoglu and Hassabis2016) offer a critical example. Lake et al. describe neural networks as implementing “learning as a process of gradual adjustment of connection strengths.” However, recent work has introduced a number of architectures within which learning depends on rapid storage mechanisms, independent of connection-weight changes (Duan et al. Reference Duan, Schulman, Chen, Bartlett, Sutskever and Abbeel2016; Graves et al. Reference Graves, Wayne, Reynolds, Harley, Danihelka, Grabska-Barwińska, Colmenarejo, Grefenstette, Ramalho, Agapiou, Badia, Hermann, Zwols, Ostrovski, Cain, King, Summerfield, Blunsom, Kayukcuoglu and Hassabis2016; Wang et al. Reference Wang, Kurth-Nelson, Tirumala, Soyer, Leibo, Munos, Blundell, Kumaran and Botvinick2017; Vinyals et al. Reference Vinyals, Blundell, Lillicrap, Wierstra, Vinyals, Blundell, Lillicrap, Kavukcuoglu, Wierstra, Lee, Sugiyama, Luxburg, Guyon and Garnett2016). Indeed, such mechanisms have even been applied to one-shot classification of Omniglot characters (Santoro et al., Reference Santoro, Bartunov, Botvinick, Wierstra and Lillicrap2016) and Atari video game play (Blundell et al. Reference Blundell, Uria, Pritzel, Li, Ruderman, Leibo, Rae, Wierstra and Hassabis2016). Furthermore, the connection-weight changes that do occur in such models can serve in part to support learning-to-learn (Duan et al. Reference Duan, Schulman, Chen, Bartlett, Sutskever and Abbeel2016; Graves et al. Reference Graves, Wayne, Reynolds, Harley, Danihelka, Grabska-Barwińska, Colmenarejo, Grefenstette, Ramalho, Agapiou, Badia, Hermann, Zwols, Ostrovski, Cain, King, Summerfield, Blunsom, Kayukcuoglu and Hassabis2016; Ravi and Larochelle Reference Ravi and Larochelle2017; Vinyals et al. Reference Vinyals, Blundell, Lillicrap, Wierstra, Vinyals, Blundell, Lillicrap, Kavukcuoglu, Wierstra, Lee, Sugiyama, Luxburg, Guyon and Garnett2016; Wang et al. Reference Wang, Kurth-Nelson, Tirumala, Soyer, Leibo, Munos, Blundell, Kumaran and Botvinick2017), another of Lake et al.'s key ingredients for human-like intelligence. As recent work has shown (Andrychowicz et al. Reference Andrychowicz, Denil, Gomez, Hoffman, Pfau, Schaul, Shillingford, de Freitas, Lee, Sugiyama, Luxburg, Guyon and Garnett2016; Denil et al. Reference Denil, Agrawal, Kulkarni, Erez, Battaglia and de Freitas2016; Duan et al. Reference Duan, Schulman, Chen, Bartlett, Sutskever and Abbeel2016; Hochreiter et al. Reference Hochreiter, Younger, Conwell, Dorffner, Bischoff and Hornik2001; Santoro et al. Reference Santoro, Bartunov, Botvinick, Wierstra and Lillicrap2016; Wang et al. Reference Wang, Kurth-Nelson, Tirumala, Soyer, Leibo, Munos, Blundell, Kumaran and Botvinick2017), this learning-to-learn mechanism can allow agents to adapt rapidly to new problems, providing a novel route to install prior knowledge through learning, rather than by hand. Learning to learn enables us to learn a neural network agent over a long time. This network, however, is trained to be good at learning rapidly from few examples, regardless of what those examples might be. So, although the meta-learning process might be slow, the product is a neural network agent that can learn to harness a few data points to carry out numerous tasks, including imitation, inference, task specialization, and prediction.
Another reason why we believe it may be advantageous to autonomously learn internal models is that such models can be shaped directly by specific, concrete tasks. A model is valuable not because it veridically captures some ground truth, but because it can be efficiently leveraged to support adaptive behavior. Just as Newtonian mechanics is sufficient for explaining many everyday phenomena, yet too crude to be useful to particle physicists and cosmologists, an agent's models should be calibrated to its tasks. This is essential for models to scale to real-world complexity, because it is usually too expensive, or even impossible, for a system to acquire and work with extremely fine-grained models of the world (Botvinick & Weinstein Reference Botvinick, Weinstein, Solway and Barto2015; Silver et al. Reference Silver, van Hasselt, Hessel, Schaul, Guez, Harley, Dulac-Arnold, Reichert, Rabinowitz, Barreto, Degris, Balcan and Weinberger2017). Of course, a good model of the world should be applicable across a range of task conditions, even ones that have not been previously encountered. However, this simply implies that models should be calibrated not only to individual tasks, but also to the distribution of tasks – inferred through experience or evolution – that is likely to arise in practice.
Finally, in addition to the importance of model building, it is important to recognize that real autonomy also depends on control functions, the processes that leverage models to make actual decisions. An autonomous agent needs good models, but it also needs to know how to make use of them (Botvinick & Cohen Reference Botvinick and Cohen2014), especially in settings where task goals may vary over time. This point also favors a learning and agent-based approach, because it allows control structures to co-evolve with internal models, maximizing their compatibility. Though efforts to capitalize on these advantages in practice are only in their infancy, recent work from Hamrick and colleagues (Reference Hamrick, Ballard, Pascanu, Vinyals, Heess and Battaglia2017), which simultaneously trained an internal model and a corresponding set of control functions, provides a case study of how this might work.
Our comments here, like the target article, have focused on model-based cognition. However, an aside on model-free methods is warranted. Lake et al. describe model-free methods as providing peripheral support for model-based approaches. However, there is abundant evidence that model-free mechanisms play a pervasive role in human learning and decision making (Kahneman Reference Kahneman2011). Furthermore, the dramatic recent successes of model-free learning in areas such as game play, navigation, and robotics suggest that it may constitute a first-class, independently valuable approach for machine learning. Lake et al. call attention to the heavy data demands of model-free learning, as reflected in DQN learning curves. However, even since the initial report on DQN (Mnih et al. Reference Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski, Petersen, Beattie, Sadik, Antonoglous, King, Kumaran, Wierstra and Hassabis2015), techniques have been developed that significantly reduce the data requirements of this and related model-free learning methods, including prioritized memory replay (Schaul et al. Reference Schaul, Quan, Antonoglou and Silver2016), improved exploration methods (Bellemare et al. Reference Bellemare, Srinivasan, Ostrovski, Schaul, Saxton, Munos, Lee, Sugiyama, Luxburg, Guyon and Garnett2016), and techniques for episodic reinforcement learning (Blundell et al. Reference Blundell, Uria, Pritzel, Li, Ruderman, Leibo, Rae, Wierstra and Hassabis2016). Given the pace of such advances, it may be premature to relegate model-free methods to a merely supporting role.
To conclude, despite the differences we have focused on here, we agree strongly with Lake et al. that human-like intelligence depends at least in part on richly structured internal models. Our approach to building human-like intelligence can be summarized as a commitment to developing autonomous agents: agents that shoulder the burden of building their own models and arriving at their own procedures for leveraging them. Autonomy, in this sense, confers a capacity to build economical task-sensitive internal models, and to adapt flexibly to diverse circumstances, while avoiding a dependence on detailed, domain-specific prior information. A key challenge in pursuing greater autonomy is the need to find more efficient means of extracting knowledge from potentially limited data. But recent work on memory, exploration, compositional representation, and processing architectures, provides grounds for optimism. In fairness, the authors of the target article have also offered, in other work, some indication of how their approach might be elaborated to support greater agent autonomy (Lake et al. Reference Lake, Lawrence and Tenenbaum2016). We may therefore be following slowly converging paths. On a final note, it is worth pointing out that as our agents gain in autonomy, the opportunity increasingly arises for us to obtain new insights from what they themselves discover. In this way, the pursuit of agent autonomy carries the potential to transform the current AI landscape, revealing new paths toward human-like intelligence.